| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Remove redundant semi-colon in finalize_btf_ext().
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210402012634.1965453-1-yangyingliang@huawei.com
|
|
|
|
|
|
|
|
|
|
| |
struct btf_type is declared twice. One is declared at 35th line. The below
one is not needed, hence remove the duplicate.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210401072037.995849-1-wanjiabing@vivo.com
|
|
|
|
|
|
|
|
|
|
|
| |
struct bpf_prog is declared twice. There is one declaration which is
independent on the macro at 18th line. So the below one is not needed
though. Remove the duplicate.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210401064637.993327-1-wanjiabing@vivo.com
|
|
|
|
|
|
|
|
|
|
|
| |
'stack' parameter is not used in ___bpf_prog_run() after f696b8f471ec
("bpf: split bpf core interpreter"), the base address have been set to
FP reg. So consequently remove it.
Signed-off-by: He Fengqing <hefengqing@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210331075135.3850782-1-hefengqing@huawei.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With a relatively recent clang master branch test_map skips a section,
libbpf: elf: skipping unrecognized data section(5) .rodata.str1.1
the cause is some pointless strings from bpf_printks in the BPF program
loaded during testing. After just removing the prints to fix above error
Daniel points out the program is a bit pointless and could be simply the
empty program returning SK_PASS.
Here we do just that and return simply SK_PASS. This program is used with
test_maps selftests to test insert/remove of a program into the sockmap
and sockhash maps. Its not testing actual functionality of the TCP
sockmap programs, these are tested from test_sockmap. So we shouldn't
lose in test coverage and fix above warnings. This original test was
added before test_sockmap existed and has been copied around ever since,
clean it up now.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/161731595664.74613.1603087410166945302.stgit@john-XPS-13-9370
|
|
|
|
|
|
|
|
|
| |
Group all the often used fields in the first cache line,
to reduce cache line misses.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Order fields to increase locality for most used protocols.
udplite and icmp are moved at the end.
Same for proc_net_devsnmp6 which is not used in fast path.
This potentially saves one cache line miss for typical TCP/UDP over IPv4/IPv6.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
If the "type_a->nfcid_len" is too large then it would lead to memory
corruption in pn533_target_found_type_a() when we do:
memcpy(nfc_tgt->nfcid1, tgt_type_a->nfcid_data, nfc_tgt->nfcid1_len);
Fixes: c3b1e1e8a76f ("NFC: Export NFCID1 from pn533")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Ioana Ciornei says:
====================
dpaa2-eth: add rx copybreak support
DMA unmapping, allocating a new buffer and DMA mapping it back on the
refill path is really not that efficient. Proper buffer recycling (page
pool, flipping the page and using the other half) cannot be done for
DPAA2 since it's not a ring based controller but it rather deals with
multiple queues which all get their buffers from the same buffer pool on
Rx.
To circumvent these limitations, add support for Rx copybreak in
dpaa2-eth.
Below you can find a summary of the tests that were run to end up
with the default rx copybreak value of 512.
A bit about the setup - a LS2088A SoC, 8 x Cortex A72 @ 1.8GHz, IPfwd
zero loss test @ 20Gbit/s throughput. I tested multiple frame sizes to
get an idea where is the break even point.
Here are 2 sets of results, (1) is the baseline and (2) is just
allocating a new skb for all frames sizes received (as if the copybreak
was even to the MTU). All numbers are in Mpps.
64 128 256 512 640 768 896
(1) 3.23 3.23 3.24 3.21 3.1 2.76 2.71
(2) 3.95 3.88 3.79 3.62 3.3 3.02 2.65
It seems that even for 512 bytes frame sizes it's comfortably better when
allocating a new skb. After that, we see diminishing rewards or even worse.
Changes in v2:
- properly marked dpaa2_eth_copybreak as static
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It's useful, especially for debugging purposes, to have the Rx copybreak
value changeable at runtime. Export it as an ethtool tunable.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
DMA unmapping, allocating a new buffer and DMA mapping it back on the
refill path is really not that efficient. Proper buffer recycling (page
pool, flipping the page and using the other half) cannot be done for
DPAA2 since it's not a ring based controller but it rather deals with
multiple queues which all get their buffers from the same buffer pool on
Rx.
To circumvent these limitations, add support for Rx copybreak. For small
sized packets instead of creating a skb around the buffer in which the
frame was received, allocate a new sk buffer altogether, copy the
contents of the frame and release the initial page back into the buffer
pool.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
Rename the dpaa2_eth_xdp_release_buf function into dpaa2_eth_recycle_buf
since in the next patches we'll be using the same recycle mechanism for
the normal stack path beside for XDP_DROP.
Also, rename the array which holds the buffers to be recycled so that it
does not have any reference to XDP.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Mat Martineau says:
====================
MPTCP: Miscellaneous changes
Here is a collection of patches from the MPTCP tree:
Patches 1 and 2 add some helpful MIB counters for connection
information.
Patch 3 cleans up some unnecessary checks.
Patch 4 is a new feature, support for the MP_TCPRST option. This option
is used when resetting one subflow within a MPTCP connection, and
provides a reason code that the recipient can use when deciding how to
adapt to the lost subflow.
Patches 5-7 update the existing MPTCP selftests to improve timeout
handling and to share better information when tests fail.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Very occasionally, MPTCP selftests fail. Yeah, I saw that at least once!
Here we provide more details in case of errors with mptcp_join.sh script
like it was done with mptcp_connect.sh, see
commit 767389c8dd55 ("selftests: mptcp: dump more info on errors")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Not to be impacted by packets sent between sub-tests.
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
'mptcp_connect' already has a timeout for poll() but in some cases, it
is not enough.
With "timeout" tool, we will force the command to fail if it doesn't
finish on time. Thanks to that, the script will continue and display
details about the current state before marking the test as failed.
Displaying this state is very important to be able to understand the
issue. Best to have our CI reporting the issue than just "the test
hanged".
Note that in mptcp_connect.sh, we were using a long timeout to validate
the fact we cannot create a socket if a sysctl is set. We don't need
this timeout.
In diag.sh, we want to send signals to mptcp_connect instances that have
been started in the netns. But we cannot send this signal to 'timeout'
otherwise that will stop the timeout and messages telling us SIGUSR1 has
been received will be printed. Instead of trying to find the right PID
and storing them in an array, we can simply use the output of
'ip netns pids' which is all the PIDs we want to send signal to.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/160
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The MPTCP reset option allows to carry a mptcp-specific error code that
provides more information on the nature of a connection reset.
Reset option data received gets stored in the subflow context so it can
be sent to userspace via the 'subflow closed' netlink event.
When a subflow is closed, the desired error code that should be sent to
the peer is also placed in the subflow context structure.
If a reset is sent before subflow establishment could complete, e.g. on
HMAC failure during an MP_JOIN operation, the mptcp skb extension is
used to store the reset information.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently we explicitly check for the first subflow being
NULL in a couple of places, even if we don't need any
special actions in such scenario.
Just drop the unneeded checks, to avoid confusion.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We are not currently tracking the active MPTCP connection
attempts. Let's add the related counters.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/
|
|
|
|
|
|
|
|
| |
If the MPTCP protocol is unable to create a new token,
the socket fallback to plain TCP, let's keep track
of such events via a specific MIB.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Shannon Nelson says:
====================
ionic: add PTP and hw clock support
This patchset adds support for accessing the DSC hardware clock and
for offloading PTP timestamping.
Tx packet timestamping happens through a separate Tx queue set up with
expanded completion descriptors that can report the timestamp.
Rx timestamping can happen either on all queues, or on a separate
timestamping queue when specific filtering is requested. Again, the
timestamps are reported with the expanded completion descriptors.
The timestamping offload ability is advertised but not enabled until an
OS service asks for it. At that time the driver's queues are reconfigured
to use the different completion descriptors and the private processing
queues as needed.
Reading the raw clock value comes through a new pair of values in the
device info registers in BAR0. These high and low values are interpreted
with help from new clock mask, mult, and shift values in the device
identity information.
First we add the ability to detect new queue features, then the handling
of the new descriptor sizes. After adding the new interface structures,
we start adding the support code, saving the advertising to the stack
for last.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Let the network stack know we've got support for timestamping
the packets.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Add the new hwstamp stats to our ethtool stats output.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add the get_ts_info() callback for ethtool support of
timestamping information.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The Tx and Rx timestamped packets are handled through separate
queues. Here we set them up, service them, and tear them down
along with the normal Tx and Rx queues.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We do hardware timestamping through a separate Tx queue,
and optionally through a separate Rx queue. These queues
are allocated, freed, and tracked separately from the basic
queue arrays.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add handling of the new Rx packet classification filter type.
This simple bit of classification allows for steering packets
to a separate Rx queue for processing.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
These are changes to compile and link the new code, but no
new feature support is available or advertised yet.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This adds the file of code for supporting Tx and Rx hardware
timestamps and the raw clock interface, but does not yet link
it in for compiling or use.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Split the wait part out of adminq_post_wait() into a separate
function so that a caller can have finer grain control over
the sequencing of operations and locking.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The interface for hardware timestamping includes a new FW
request, device identity fields, Tx and Rx queue feature bits, a
new Rx filter type, the beginnings of Rx packet classifications,
and hardware timestamp registers.
If the IONIC_ETH_HW_TIMESTAMP bit is shown in the
ionic_lif_config features bit string, then we have support
for the hw clock registers. If the IONIC_RXQ_F_HWSTAMP and
IONIC_TXQ_F_HWSTAMP features are shown in the ionic_q_identity
features, then the queues can support HW timestamps on packets.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In preparating for hardware timestamping, we need to support
large Tx and Rx completion descriptors. Here we add the new
queue feature ids and handling for the completion descriptor
sizes.
We only are adding support for the Rx 2x sized completion
descriptors in the general Rx queues for now as we will be
using it for PTP Rx support, and we don't have an immediate
use for the large descriptors in the general Tx queues yet;
it will be used in a special Tx queues added in one of the
next few patches.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/
|
|
|
|
|
|
|
|
|
| |
Add queue feature extensions to prepare for features that
can be queue specific, in addition to the general queue
features already defined. While we're here, change the
existing feature ids from #defines to enum.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-04-01
The following pull-request contains BPF updates for your *net-next* tree.
We've added 68 non-merge commits during the last 7 day(s) which contain
a total of 70 files changed, 2944 insertions(+), 1139 deletions(-).
The main changes are:
1) UDP support for sockmap, from Cong.
2) Verifier merge conflict resolution fix, from Daniel.
3) xsk selftests enhancements, from Maciej.
4) Unstable helpers aka kernel func calling, from Martin.
5) Batches ops for LPM map, from Pedro.
6) Fix race in bpf_get_local_storage, from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Cong Wang says:
====================
From: Cong Wang <cong.wang@bytedance.com>
We have thousands of services connected to a daemon on every host
via AF_UNIX dgram sockets, after they are moved into VM, we have to
add a proxy to forward these communications from VM to host, because
rewriting thousands of them is not practical. This proxy uses an
AF_UNIX socket connected to services and a UDP socket to connect to
the host. It is inefficient because data is copied between kernel
space and user space twice, and we can not use splice() which only
supports TCP. Therefore, we want to use sockmap to do the splicing
without going to user-space at all (after the initial setup).
Currently sockmap only fully supports TCP, UDP is partially supported
as it is only allowed to add into sockmap. This patchset, as the second
part of the original large patchset, extends sockmap with:
1) cross-protocol support with BPF_SK_SKB_VERDICT; 2) full UDP support.
On the high level, ->read_sock() is required for each protocol to support
sockmap redirection, and in order to do sock proto update, a new ops
->psock_update_sk_prot() is introduced, which is also required. And the
BPF ->recvmsg() is also needed to replace the original ->recvmsg() to
retrieve skmsg. To make life easier, we have to get rid of lock_sock()
in sk_psock_handle_skb(), otherwise we would have to implement
->sendmsg_locked() on top of ->sendmsg(), which is ugly.
Please see each patch for more details.
To see the big picture, the original patchset is available here:
https://github.com/congwang/linux/tree/sockmap
this patchset is also available:
https://github.com/congwang/linux/tree/sockmap2
---
v8: get rid of 'offset' in udp_read_sock()
add checks for skb_verdict/stream_verdict conflict
add two cleanup patches for sock_map_link()
add a new test case
v7: use work_mutex to protect psock->work
return err in udp_read_sock()
add patch 6/13
clean up test case
v6: get rid of sk_psock_zap_ingress()
add rcu work patch
v5: use INDIRECT_CALL_2() for function pointers
use ingress_lock to fix a race condition found by Jacub
rename two helper functions
v4: get rid of lock_sock() in sk_psock_handle_skb()
get rid of udp_sendmsg_locked()
remove an empty line
update cover letter
v3: export tcp/udp_update_proto()
rename sk->sk_prot->psock_update_sk_prot()
improve changelogs
v2: separate from the original large patchset
rebase to the latest bpf-next
split UDP test case
move inet_csk_has_ulp() check to tcp_bpf.c
clean up udp_read_sock()
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This adds a test case to ensure BPF_SK_SKB_VERDICT and
BPF_SK_STREAM_VERDICT will never be attached at the same time.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210331023237.41094-17-xiyou.wangcong@gmail.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a test case to ensure redirection between two UDP sockets work.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210331023237.41094-16-xiyou.wangcong@gmail.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Now UDP supports sockmap and redirection, we can safely update
the sock type checks for it accordingly.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210331023237.41094-15-xiyou.wangcong@gmail.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We have to implement udp_bpf_recvmsg() to replace the ->recvmsg()
to retrieve skmsg from ingress_msg.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210331023237.41094-14-xiyou.wangcong@gmail.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Although these two functions are only used by TCP, they are not
specific to TCP at all, both operate on skmsg and ingress_msg,
so fit in net/core/skmsg.c very well.
And we will need them for non-TCP, so rename and move them to
skmsg.c and export them to modules.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210331023237.41094-13-xiyou.wangcong@gmail.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This is similar to tcp_read_sock(), except we do not need
to worry about connections, we just need to retrieve skb
from UDP receive queue.
Note, the return value of ->read_sock() is unused in
sk_psock_verdict_data_ready(), and UDP still does not
support splice() due to lack of ->splice_read(), so users
can not reach udp_read_sock() directly.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210331023237.41094-12-xiyou.wangcong@gmail.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently sockmap calls into each protocol to update the struct
proto and replace it. This certainly won't work when the protocol
is implemented as a module, for example, AF_UNIX.
Introduce a new ops sk->sk_prot->psock_update_sk_prot(), so each
protocol can implement its own way to replace the struct proto.
This also helps get rid of symbol dependencies on CONFIG_INET.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210331023237.41094-11-xiyou.wangcong@gmail.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Reusing BPF_SK_SKB_STREAM_VERDICT is possible but its name is
confusing and more importantly we still want to distinguish them
from user-space. So we can just reuse the stream verdict code but
introduce a new type of eBPF program, skb_verdict. Users are not
allowed to attach stream_verdict and skb_verdict programs to the
same map.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210331023237.41094-10-xiyou.wangcong@gmail.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Now we can fold sock_map_link_no_progs() into sock_map_link()
and get rid of sock_map_link_no_progs().
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210331023237.41094-9-xiyou.wangcong@gmail.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
sock_map_link() passes down map progs, but it is confusing
to see both map progs and psock progs. Make the map progs
more obvious by retrieving it directly with sock_map_progs()
inside sock_map_link(). Now it is aligned with
sock_map_link_no_progs() too.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210331023237.41094-8-xiyou.wangcong@gmail.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This function is only called in process context.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210331023237.41094-7-xiyou.wangcong@gmail.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The RCU callback sk_psock_destroy() only queues work psock->gc,
so we can just switch to rcu work to simplify the code.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210331023237.41094-6-xiyou.wangcong@gmail.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We do not have to lock the sock to avoid losing sk_socket,
instead we can purge all the ingress queues when we close
the socket. Sending or receiving packets after orphaning
socket makes no sense.
We do purge these queues when psock refcnt reaches zero but
here we want to purge them explicitly in sock_map_close().
There are also some nasty race conditions on testing bit
SK_PSOCK_TX_ENABLED and queuing/canceling the psock work,
we can expand psock->ingress_lock a bit to protect them too.
As noticed by John, we still have to lock the psock->work,
because the same work item could be running concurrently on
different CPU's.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210331023237.41094-5-xiyou.wangcong@gmail.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We only have skb_send_sock_locked() which requires callers
to use lock_sock(). Introduce a variant skb_send_sock()
which locks on its own, callers do not need to lock it
any more. This will save us from adding a ->sendmsg_locked
for each protocol.
To reuse the code, pass function pointers to __skb_send_sock()
and build skb_send_sock() and skb_send_sock_locked() on top.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210331023237.41094-4-xiyou.wangcong@gmail.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently we rely on lock_sock to protect ingress_msg,
it is too big for this, we can actually just use a spinlock
to protect this list like protecting other skb queues.
__tcp_bpf_recvmsg() is still special because of peeking,
it still has to use lock_sock.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210331023237.41094-3-xiyou.wangcong@gmail.com
|