summaryrefslogtreecommitdiffstats
path: root/include (follow)
Commit message (Collapse)AuthorAgeFilesLines
* net: ethernet: mtk_wed: introduce WED support for MT7988Sujuan Chen2023-09-191-2/+6
| | | | | | | | | | Similar to MT7986 and MT7622, enable Wireless Ethernet Ditpatcher for MT7988 in order to offload traffic forwarded from LAN/WLAN to WLAN/LAN Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net: ethernet: mtk_wed: introduce mtk_wed_buf structureLorenzo Bianconi2023-09-191-1/+6
| | | | | | | | | | | | Introduce mtk_wed_buf structure to store both virtual and physical addresses allocated in mtk_wed_tx_buffer_alloc() routine. This is a preliminary patch to add WED support for MT7988 SoC since it relies on a different dma descriptor layout not storing page dma addresses. Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net: ethernet: mtk_wed: rename mtk_rxbm_desc in mtk_wed_bm_descLorenzo Bianconi2023-09-191-2/+2
| | | | | | | | Rename mtk_rxbm_desc structure in mtk_wed_bm_desc since it will be used even on tx side by MT7988 SoC. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* ipv6: lockless IPV6_ADDR_PREFERENCES implementationEric Dumazet2023-09-193-17/+10
| | | | | | | | | | | | | We have data-races while reading np->srcprefs Switch the field to a plain byte, add READ_ONCE() and WRITE_ONCE() annotations where needed, and IPV6_ADDR_PREFERENCES setsockopt() can now be lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230918142321.1794107-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net: phy: fix regression with AX88772A PHY driverRussell King (Oracle)2023-09-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Marek reports that a deadlock occurs with the AX88772A PHY used on the ASIX USB network driver: asix 1-1.4:1.0 (unnamed net_device) (uninitialized): PHY [usb-001:003:10] driver [Asix Electronics AX88772A] (irq=POLL) Asix Electronics AX88772A usb-001:003:10: attached PHY driver(mii_bus:phy_addr=usb-001:003:10, irq=POLL) asix 1-1.4:1.0 eth0: register 'asix' at usb-12110000.usb-1.4, ASIX AX88772 USB 2.0 Ethernet, a2:99:b6:cd:11:eb asix 1-1.4:1.0 eth0: configuring for phy/internal link mode ============================================ WARNING: possible recursive locking detected 6.6.0-rc1-00239-g8da77df649c4-dirty #13949 Not tainted -------------------------------------------- kworker/3:3/71 is trying to acquire lock: c6c704cc (&dev->lock){+.+.}-{3:3}, at: phy_start_aneg+0x1c/0x38 but task is already holding lock: c6c704cc (&dev->lock){+.+.}-{3:3}, at: phy_state_machine+0x100/0x2b8 This is because we now consistently call phy_process_state_change() while holding phydev->lock, but the AX88772A PHY driver then goes on to call phy_start_aneg() which tries to grab the same lock - causing deadlock. Fix this by exporting the unlocked version, and use this in the PHY driver instead. Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Fixes: ef113a60d0a9 ("net: phy: call phy_error_precise() while holding the lock") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/E1qiEFs-007g7b-Lq@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net: stmmac: Tx coe sw fallbackRohan G Thomas2023-09-181-0/+1
| | | | | | | | | | | | | | | | Add sw fallback of tx checksum calculation for those tx queues that don't support tx checksum offloading. DW xGMAC IP can be synthesized such that it can support tx checksum offloading only for a few initial tx queues. Also as Serge pointed out, for the DW QoS IP, tx coe can be individually configured for each tx queue. So when tx coe is enabled, for any tx queue that doesn't support tx coe with 'coe-unsupported' flag set will have a sw fallback happen in the driver for tx checksum calculation when any packets to be transmitted on these tx queues. Signed-off-by: Rohan G Thomas <rohan.g.thomas@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ceph: Annotate struct ceph_monmap with __counted_byKees Cook2023-09-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct ceph_monmap. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: ceph-devel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pds_core: check health in devcmd waitShannon Nelson2023-09-181-0/+1
| | | | | | | | | Similar to what we do in the AdminQ, check for devcmd health while waiting for an answer. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller2023-09-178-19/+136
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexei Starovoitov says: ==================== The following pull-request contains BPF updates for your *net-next* tree. We've added 73 non-merge commits during the last 9 day(s) which contain a total of 79 files changed, 5275 insertions(+), 600 deletions(-). The main changes are: 1) Basic BTF validation in libbpf, from Andrii Nakryiko. 2) bpf_assert(), bpf_throw(), exceptions in bpf progs, from Kumar Kartikeya Dwivedi. 3) next_thread cleanups, from Oleg Nesterov. 4) Add mcpu=v4 support to arm32, from Puranjay Mohan. 5) Add support for __percpu pointers in bpf progs, from Yonghong Song. 6) Fix bpf tailcall interaction with bpf trampoline, from Leon Hwang. 7) Raise irq_work in bpf_mem_alloc while irqs are disabled to improve refill probabablity, from Hou Tao. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git Thanks a lot! Also thanks to reporters, reviewers and testers of commits in this pull-request: Alan Maguire, Andrey Konovalov, Dave Marchevsky, "Eric W. Biederman", Jiri Olsa, Maciej Fijalkowski, Quentin Monnet, Russell King (Oracle), Song Liu, Stanislav Fomichev, Yonghong Song ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * mm: kasan: Declare kasan_unpoison_task_stack_below in kasan.hKumar Kartikeya Dwivedi2023-09-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We require access to this kasan helper in BPF code in the next patch where we have to unpoison the task stack when we unwind and reset the stack frame from bpf_throw, and it never really unpoisons the poisoned stack slots on entry when compiler instrumentation is generated by CONFIG_KASAN_STACK and inline instrumentation is supported. Also, remove the declaration from mm/kasan/kasan.h as we put it in the header file kasan.h. Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Suggested-by: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-10-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Add support for custom exception callbacksKumar Kartikeya Dwivedi2023-09-162-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By default, the subprog generated by the verifier to handle a thrown exception hardcodes a return value of 0. To allow user-defined logic and modification of the return value when an exception is thrown, introduce the 'exception_callback:' declaration tag, which marks a callback as the default exception handler for the program. The format of the declaration tag is 'exception_callback:<value>', where <value> is the name of the exception callback. Each main program can be tagged using this BTF declaratiion tag to associate it with an exception callback. In case the tag is absent, the default callback is used. As such, the exception callback cannot be modified at runtime, only set during verification. Allowing modification of the callback for the current program execution at runtime leads to issues when the programs begin to nest, as any per-CPU state maintaing this information will have to be saved and restored. We don't want it to stay in bpf_prog_aux as this takes a global effect for all programs. An alternative solution is spilling the callback pointer at a known location on the program stack on entry, and then passing this location to bpf_throw as a parameter. However, since exceptions are geared more towards a use case where they are ideally never invoked, optimizing for this use case and adding to the complexity has diminishing returns. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Implement BPF exceptionsKumar Kartikeya Dwivedi2023-09-163-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements BPF exceptions, and introduces a bpf_throw kfunc to allow programs to throw exceptions during their execution at runtime. A bpf_throw invocation is treated as an immediate termination of the program, returning back to its caller within the kernel, unwinding all stack frames. This allows the program to simplify its implementation, by testing for runtime conditions which the verifier has no visibility into, and assert that they are true. In case they are not, the program can simply throw an exception from the other branch. BPF exceptions are explicitly *NOT* an unlikely slowpath error handling primitive, and this objective has guided design choices of the implementation of the them within the kernel (with the bulk of the cost for unwinding the stack offloaded to the bpf_throw kfunc). The implementation of this mechanism requires use of add_hidden_subprog mechanism introduced in the previous patch, which generates a couple of instructions to move R1 to R0 and exit. The JIT then rewrites the prologue of this subprog to take the stack pointer and frame pointer as inputs and reset the stack frame, popping all callee-saved registers saved by the main subprog. The bpf_throw function then walks the stack at runtime, and invokes this exception subprog with the stack and frame pointers as parameters. Reviewers must take note that currently the main program is made to save all callee-saved registers on x86_64 during entry into the program. This is because we must do an equivalent of a lightweight context switch when unwinding the stack, therefore we need the callee-saved registers of the caller of the BPF program to be able to return with a sane state. Note that we have to additionally handle r12, even though it is not used by the program, because when throwing the exception the program makes an entry into the kernel which could clobber r12 after saving it on the stack. To be able to preserve the value we received on program entry, we push r12 and restore it from the generated subprogram when unwinding the stack. For now, bpf_throw invocation fails when lingering resources or locks exist in that path of the program. In a future followup, bpf_throw will be extended to perform frame-by-frame unwinding to release lingering resources for each stack frame, removing this limitation. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Implement support for adding hidden subprogsKumar Kartikeya Dwivedi2023-09-162-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce support in the verifier for generating a subprogram and include it as part of a BPF program dynamically after the do_check phase is complete. The first user will be the next patch which generates default exception callbacks if none are set for the program. The phase of invocation will be do_misc_fixups. Note that this is an internal verifier function, and should be used with instruction blocks which uphold the invariants stated in check_subprogs. Since these subprogs are always appended to the end of the instruction sequence of the program, it becomes relatively inexpensive to do the related adjustments to the subprog_info of the program. Only the fake exit subprogram is shifted forward, making room for our new subprog. This is useful to insert a new subprogram, get it JITed, and obtain its function pointer. The next patch will use this functionality to insert a default exception callback which will be invoked after unwinding the stack. Note that these added subprograms are invisible to userspace, and never reported in BPF_OBJ_GET_INFO_BY_ID etc. For now, only a single subprogram is supported, but more can be easily supported in the future. To this end, two function counts are introduced now, the existing func_cnt, and real_func_cnt, the latter including hidden programs. This allows us to conver the JIT code to use the real_func_cnt for management of resources while syscall path continues working with existing func_cnt. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * arch/x86: Implement arch_bpf_stack_walkKumar Kartikeya Dwivedi2023-09-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The plumbing for offline unwinding when we throw an exception in programs would require walking the stack, hence introduce a new arch_bpf_stack_walk function. This is provided when the JIT supports exceptions, i.e. bpf_jit_supports_exceptions is true. The arch-specific code is really minimal, hence it should be straightforward to extend this support to other architectures as well, as it reuses the logic of arch_stack_walk, but allowing access to unwind_state data. Once the stack pointer and frame pointer are known for the main subprog during the unwinding, we know the stack layout and location of any callee-saved registers which must be restored before we return back to the kernel. This handling will be added in the subsequent patches. Note that while we primarily unwind through BPF frames, which are effectively CONFIG_UNWINDER_FRAME_POINTER, we still need one of this or CONFIG_UNWINDER_ORC to be able to unwind through the bpf_throw frame from which we begin walking the stack. We also require both sp and bp (stack and frame pointers) from the unwind_state structure, which are only available when one of these two options are enabled. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Use bpf_is_subprog to check for subprogsKumar Kartikeya Dwivedi2023-09-161-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We would like to know whether a bpf_prog corresponds to the main prog or one of the subprogs. The current JIT implementations simply check this using the func_idx in bpf_prog->aux->func_idx. When the index is 0, it belongs to the main program, otherwise it corresponds to some subprogram. This will also be necessary to halt exception propagation while walking the stack when an exception is thrown, so we add a simple helper function to check this, named bpf_is_subprog, and convert existing JIT implementations to also make use of it. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf/tests: add tests for cpuv4 instructionsPuranjay Mohan2023-09-161-4/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The BPF JITs now support cpuv4 instructions. Add tests for these new instructions to the test suite: 1. Sign extended Load 2. Sign extended Mov 3. Unconditional byte swap 4. Unconditional jump with 32-bit offset 5. Signed division and modulo Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20230907230550.1417590-9-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: expose information about supported xdp metadata kfuncStanislav Fomichev2023-09-152-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new xdp-rx-metadata-features member to netdev netlink which exports a bitmask of supported kfuncs. Most of the patch is autogenerated (headers), the only relevant part is netdev.yaml and the changes in netdev-genl.c to marshal into netlink. Example output on veth: $ ip link add veth0 type veth peer name veth1 # ifndex == 12 $ ./tools/net/ynl/samples/netdev 12 Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 12 veth1[12] xdp-features (23): basic redirect rx-sg xdp-rx-metadata-features (3): timestamp hash xdp-zc-max-segs=0 Cc: netdev@vger.kernel.org Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230913171350.369987-3-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
| * bpf: make it easier to add new metadata kfuncStanislav Fomichev2023-09-151-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No functional changes. Instead of having hand-crafted code in bpf_dev_bound_resolve_kfunc, move kfunc <> xmo handler relationship into XDP_METADATA_KFUNC_xxx. This way, any time new kfunc is added, we don't have to touch bpf_dev_bound_resolve_kfunc. Also document XDP_METADATA_KFUNC_xxx arguments since we now have more than two and it might be confusing what is what. Cc: netdev@vger.kernel.org Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230913171350.369987-2-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
| * xsk: add multi-buffer support for sockets sharing umemTirthendu Sarkar2023-09-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Userspace applications indicate their multi-buffer capability to xsk using XSK_USE_SG socket bind flag. For sockets using shared umem the bind flag may contain XSK_USE_SG only for the first socket. For any subsequent socket the only option supported is XDP_SHARED_UMEM. Add option XDP_UMEM_SG_FLAG in umem config flags to store the multi-buffer handling capability when indicated by XSK_USE_SG option in bing flag by the first socket. Use this to derive multi-buffer capability for subsequent sockets in xsk core. Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Fixes: 81470b5c3c66 ("xsk: introduce XSK_USE_SG bind flag for xsk socket") Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230907035032.2627879-1-tirthendu.sarkar@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf, x64: Fix tailcall infinite loopLeon Hwang2023-09-121-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT"), the tailcall on x64 works better than before. From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms for x64 JIT"), tailcall is able to run in BPF subprograms on x64. From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program to other BPF programs"), BPF program is able to trace other BPF programs. How about combining them all together? 1. FENTRY/FEXIT on a BPF subprogram. 2. A tailcall runs in the BPF subprogram. 3. The tailcall calls the subprogram's caller. As a result, a tailcall infinite loop comes up. And the loop would halt the machine. As we know, in tail call context, the tail_call_cnt propagates by stack and rax register between BPF subprograms. So do in trampolines. Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> Link: https://lore.kernel.org/r/20230912150442.2009-3-hffilwlqm@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecatedYonghong Song2023-09-081-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | Now 'BPF_MAP_TYPE_CGRP_STORAGE + local percpu ptr' can cover all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE functionality and more. So mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated. Also make changes in selftests/bpf/test_bpftool_synctypes.py and selftest libbpf_str to fix otherwise test errors. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230827152837.2003563-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu objYonghong Song2023-09-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bpf helpers bpf_this_cpu_ptr() and bpf_per_cpu_ptr() are re-purposed for allocated percpu objects. For an allocated percpu obj, the reg type is 'PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU'. The return type for these two re-purposed helpera is 'PTR_TO_MEM | MEM_RCU | MEM_ALLOC'. The MEM_ALLOC allows that the per-cpu data can be read and written. Since the memory allocator bpf_mem_alloc() returns a ptr to a percpu ptr for percpu data, the first argument of bpf_this_cpu_ptr() and bpf_per_cpu_ptr() is patched with a dereference before passing to the helper func. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230827152749.1997202-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Add BPF_KPTR_PERCPU as a field typeYonghong Song2023-09-081-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BPF_KPTR_PERCPU represents a percpu field type like below struct val_t { ... fields ... }; struct t { ... struct val_t __percpu_kptr *percpu_data_ptr; ... }; where #define __percpu_kptr __attribute__((btf_type_tag("percpu_kptr"))) While BPF_KPTR_REF points to a trusted kernel object or a trusted local object, BPF_KPTR_PERCPU points to a trusted local percpu object. This patch added basic support for BPF_KPTR_PERCPU related to percpu_kptr field parsing, recording and free operations. BPF_KPTR_PERCPU also supports the same map types as BPF_KPTR_REF does. Note that unlike a local kptr, it is possible that a BPF_KTPR_PERCPU struct may not contain any special fields like other kptr, bpf_spin_lock, bpf_list_head, etc. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230827152739.1996391-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * bpf: Add support for non-fix-size percpu mem allocationYonghong Song2023-09-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | This is needed for later percpu mem allocation when the allocation is done by bpf program. For such cases, a global bpf_global_percpu_ma is added where a flexible allocation size is needed. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230827152734.1995725-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* | devlink: introduce possibility to expose info about nested devlinksJiri Pirko2023-09-171-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In mlx5, there is a devlink instance created for PCI device. Also, one separate devlink instance is created for auxiliary device that represents the netdev of uplink port. This relation is currently invisible to the devlink user. Benefit from the rel infrastructure and allow for nested devlink instance to set the relationship for the nested-in devlink instance. Note that there may be many nested instances, therefore use xarray to hold the list of rel_indexes for individual nested instances. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | devlink: convert linecard nested devlink to new rel infrastructureJiri Pirko2023-09-171-2/+2
| | | | | | | | | | | | | | | | | | Benefit from the newly introduced rel infrastructure, treat the linecard nested devlink instances in the same way as port function instances. Convert the code to use the rel infrastructure. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx5: SF, Implement peer devlink set for SF representor devlink portJiri Pirko2023-09-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Benefit from the existence of internal mlx5 notifier and extend it by event MLX5_DRIVER_EVENT_SF_PEER_DEVLINK. Use this event from SF auxiliary device probe/remove functions to pass the registered SF devlink instance to the SF representor. Process the new event in SF representor code and call devl_port_fn_devlink_set() to do the assignments. Implement this in work to avoid possible deadlock when probe/remove function of SF may be called with devlink instance lock held during devlink reload. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | devlink: expose peer SF devlink instanceJiri Pirko2023-09-172-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new helper devl_port_fn_devlink_set() to be used by driver assigning a devlink instance to the peer devlink port function. Expose this to user over new netlink attribute nested under port function nest to expose devlink handle related to the port function. This is particularly helpful for user to understand the relationship between devlink instances created for SFs and the port functions they belong to. Note that caller of devlink_port_notify() needs to hold devlink instance lock, put the assertion to devl_port_fn_devlink_set() to make this requirement explicit. Also note the limitations that only allow to make this assignment for registered objects. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | mlx5: Implement SyncE support using DPLL infrastructureJiri Pirko2023-09-172-1/+60
| | | | | | | | | | | | | | | | | | | | | | | | Implement SyncE support using newly introduced DPLL support. Make sure that each PFs/VFs/SFs probed with appropriate capability will spawn a dpll auxiliary device and register appropriate dpll device and pin instances. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netdev: expose DPLL pin handle for netdeviceJiri Pirko2023-09-173-1/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case netdevice represents a SyncE port, the user needs to understand the connection between netdevice and associated DPLL pin. There might me multiple netdevices pointing to the same pin, in case of VF/SF implementation. Add a IFLA Netlink attribute to nest the DPLL pin handle, similar to how it is implemented for devlink port. Add a struct dpll_pin pointer to netdev and protect access to it by RTNL. Expose netdev_dpll_pin_set() and netdev_dpll_pin_clear() helpers to the drivers so they can set/clear the DPLL pin relationship to netdev. Note that during the lifetime of struct dpll_pin the pin handle does not change. Therefore it is save to access it lockless. It is drivers responsibility to call netdev_dpll_pin_clear() before dpll_pin_put(). Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
* | dpll: netlink: Add DPLL framework base functionsVadim Fedorenko2023-09-171-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DPLL framework is used to represent and configure DPLL devices in systems. Each device that has DPLL and can configure inputs and outputs can use this framework. Implement dpll netlink framework functions for enablement of dpll subsystem netlink family. Co-developed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Co-developed-by: Michal Michalik <michal.michalik@intel.com> Signed-off-by: Michal Michalik <michal.michalik@intel.com> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Co-developed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | dpll: core: Add DPLL framework base functionsVadim Fedorenko2023-09-171-0/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DPLL framework is used to represent and configure DPLL devices in systems. Each device that has DPLL and can configure inputs and outputs can use this framework. Implement core framework functions for further interactions with device drivers implementing dpll subsystem, as well as for interactions of DPLL netlink framework part with the subsystem itself. Co-developed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Co-developed-by: Michal Michalik <michal.michalik@intel.com> Signed-off-by: Michal Michalik <michal.michalik@intel.com> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Co-developed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | dpll: spec: Add Netlink spec in YAMLVadim Fedorenko2023-09-171-0/+201
| | | | | | | | | | | | | | | | | | | | | | | | Add a protocol spec for DPLL. Add code generated from the spec. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Michal Michalik <michal.michalik@intel.com> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nextDavid S. Miller2023-09-171-2/+9
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | -queue Tony Nguyen says: ==================== Support rx-fcs on/off for VFs Ahmed Zaki says: Allow the user to turn on/off the CRC/FCS stripping through ethtool. We first add the CRC offload capability in the virtchannel, then the feature is enabled in ice and iavf drivers. We make sure that the netdev features are fixed such that CRC stripping cannot be disabled if VLAN rx offload (VLAN strip) is enabled. Also, VLAN stripping cannot be enabled unless CRC stripping is ON. Testing was done using tcpdump to make sure that the CRC is included in the frame after: # ethtool -K <interface> rx-fcs on and is not included when it is back "off". Also, ethtool should return an error for the above command if "rx-vlan-offload" is already on and at least one VLAN interface/filter exists on the VF. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | virtchnl: Add CRC stripping capabilityPaul M Stillwell Jr2023-09-131-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some VFs may want to disable CRC stripping on incoming packets so create an offload for that. The VF already sends information about configuring its RX queues so use that structure to indicate that the CRC stripping should be enabled or not. Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
* | | tcp: new TCP_INFO stats for RTO eventsAananth V2023-09-162-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 2023 SIGCOMM paper "Improving Network Availability with Protective ReRoute" has indicated Linux TCP's RTO-triggered txhash rehashing can effectively reduce application disruption during outages. To better measure the efficacy of this feature, this patch adds three more detailed stats during RTO recovery and exports via TCP_INFO. Applications and monitoring systems can leverage this data to measure the network path diversity and end-to-end repair latency during network outages to improve their network infrastructure. The following counters are added to tcp_sock in order to track RTO events over the lifetime of a TCP socket. 1. u16 total_rto - Counts the total number of RTO timeouts. 2. u16 total_rto_recoveries - Counts the total number of RTO recoveries. 3. u32 total_rto_time - Counts the total time spent (ms) in RTO recoveries. (time spent in CA_Loss and CA_Recovery states) To compute total_rto_time, we add a new u32 rto_stamp field to tcp_sock. rto_stamp records the start timestamp (ms) of the last RTO recovery (CA_Loss). Corresponding fields are also added to the tcp_info struct. Signed-off-by: Aananth V <aananthv@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: lockless IPV6_FLOWINFO_SEND implementationEric Dumazet2023-09-152-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | np->sndflow reads are racy. Use one bit ftom atomic inet->inet_flags instead, IPV6_FLOWINFO_SEND setsockopt() can be lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: lockless IPV6_MTU_DISCOVER implementationEric Dumazet2023-09-152-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most np->pmtudisc reads are racy. Move this 3bit field on a full byte, add annotations and make IPV6_MTU_DISCOVER setsockopt() lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: lockless IPV6_ROUTER_ALERT_ISOLATE implementationEric Dumazet2023-09-152-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reads from np->rtalert_isolate are racy. Move this flag to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: move np->repflow to atomic flagsEric Dumazet2023-09-152-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Move np->repflow to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: lockless IPV6_RECVERR implemetationEric Dumazet2023-09-153-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | np->recverr is moved to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: lockless IPV6_DONTFRAG implementationEric Dumazet2023-09-154-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | Move np->dontfrag flag to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: lockless IPV6_AUTOFLOWLABEL implementationEric Dumazet2023-09-153-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Move np->autoflowlabel and np->autoflowlabel_set in inet->inet_flags, to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: lockless IPV6_MULTICAST_ALL implementationEric Dumazet2023-09-152-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Move np->mc_all to an atomic flags to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: lockless IPV6_RECVERR_RFC4884 implementationEric Dumazet2023-09-152-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Move np->recverr_rfc4884 to an atomic flag to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: lockless IPV6_MULTICAST_HOPS implementationEric Dumazet2023-09-152-9/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes data-races around np->mcast_hops, and make IPV6_MULTICAST_HOPS lockless. Note that np->mcast_hops is never negative, thus can fit an u8 field instead of s16. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: lockless IPV6_MULTICAST_LOOP implementationEric Dumazet2023-09-153-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add inet6_{test|set|clear|assign}_bit() helpers. Note that I am using bits from inet->inet_flags, this might change in the future if we need more flags. While solving data-races accessing np->mc_loop, this patch also allows to implement lockless accesses to np->mcast_hops in the following patch. Also constify sk_mc_loop() argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: lockless IPV6_UNICAST_HOPS implementationEric Dumazet2023-09-152-12/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some np->hop_limit accesses are racy, when socket lock is not held. Add missing annotations and switch to full lockless implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni2023-09-1414-30/+80
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * \ \ Merge tag 'net-6.6-rc2' of ↵Linus Torvalds2023-09-141-1/+6
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Quite unusually, this does not contains any fix coming from subtrees (nf, ebpf, wifi, etc). Current release - regressions: - bcmasp: fix possible OOB write in bcmasp_netfilt_get_all_active() Previous releases - regressions: - ipv4: fix one memleak in __inet_del_ifa() - tcp: fix bind() regressions for v4-mapped-v6 addresses. - tls: do not free tls_rec on async operation in bpf_exec_tx_verdict() - dsa: fixes for SJA1105 FDB regressions - veth: update XDP feature set when bringing up device - igb: fix hangup when enabling SR-IOV Previous releases - always broken: - kcm: fix memory leak in error path of kcm_sendmsg() - smc: fix data corruption in smcr_port_add - microchip: fix possible memory leak for vcap_dup_rule()" * tag 'net-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits) kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg(). net: renesas: rswitch: Add spin lock protection for irq {un}mask net: renesas: rswitch: Fix unmasking irq condition igb: clean up in all error paths when enabling SR-IOV ixgbe: fix timestamp configuration code selftest: tcp: Add v4-mapped-v6 cases in bind_wildcard.c. selftest: tcp: Move expected_errno into each test case in bind_wildcard.c. selftest: tcp: Fix address length in bind_wildcard.c. tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address. tcp: Fix bind() regression for v4-mapped-v6 wildcard address. tcp: Factorise sk_family-independent comparison in inet_bind2_bucket_match(_addr_any). ipv6: fix ip6_sock_set_addr_preferences() typo veth: Update XDP feature set when bringing up device net: macb: fix sleep inside spinlock net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict() net: ethernet: mtk_eth_soc: fix pse_port configuration for MT7988 net: ethernet: mtk_eth_soc: fix uninitialized variable kcm: Fix memory leak in error path of kcm_sendmsg() r8152: check budget for r8152_poll() net: dsa: sja1105: block FDB accesses that are concurrent with a switch reset ...