| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202406011539.jhwBd7DX-lkp@intel.com/
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
| |
tp->window_clamp can be read locklessly, add READ_ONCE()
and WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240404114231.2195171-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
syzkaller reported a read of uninit req->syncookie. [0]
Originally, req->syncookie was used only in tcp_conn_request()
to indicate if we need to encode SYN cookie in SYN+ACK, so the
field remains uninitialised in other places.
The commit 695751e31a63 ("bpf: tcp: Handle BPF SYN Cookie in
cookie_v[46]_check().") added another meaning in ACK path;
req->syncookie is set true if SYN cookie is validated by BPF
kfunc.
After the change, cookie_v[46]_check() always read req->syncookie,
but it is not initialised in the normal SYN cookie case as reported
by KMSAN.
Let's make sure we always initialise req->syncookie in reqsk_alloc().
[0]:
BUG: KMSAN: uninit-value in cookie_v4_check+0x22b7/0x29e0
net/ipv4/syncookies.c:477
cookie_v4_check+0x22b7/0x29e0 net/ipv4/syncookies.c:477
tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1855 [inline]
tcp_v4_do_rcv+0xb17/0x10b0 net/ipv4/tcp_ipv4.c:1914
tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:460 [inline]
ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:449
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:569
__netif_receive_skb_one_core net/core/dev.c:5538 [inline]
__netif_receive_skb+0x319/0x9e0 net/core/dev.c:5652
process_backlog+0x480/0x8b0 net/core/dev.c:5981
__napi_poll+0xe7/0x980 net/core/dev.c:6632
napi_poll net/core/dev.c:6701 [inline]
net_rx_action+0x89d/0x1820 net/core/dev.c:6813
__do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
do_softirq+0x9a/0x100 kernel/softirq.c:455
__local_bh_enable_ip+0x9f/0xb0 kernel/softirq.c:382
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:820 [inline]
__dev_queue_xmit+0x2776/0x52c0 net/core/dev.c:4362
dev_queue_xmit include/linux/netdevice.h:3091 [inline]
neigh_hh_output include/net/neighbour.h:526 [inline]
neigh_output include/net/neighbour.h:540 [inline]
ip_finish_output2+0x187a/0x1b70 net/ipv4/ip_output.c:235
__ip_finish_output+0x287/0x810
ip_finish_output+0x4b/0x550 net/ipv4/ip_output.c:323
NF_HOOK_COND include/linux/netfilter.h:303 [inline]
ip_output+0x15f/0x3f0 net/ipv4/ip_output.c:433
dst_output include/net/dst.h:450 [inline]
ip_local_out net/ipv4/ip_output.c:129 [inline]
__ip_queue_xmit+0x1e93/0x2030 net/ipv4/ip_output.c:535
ip_queue_xmit+0x60/0x80 net/ipv4/ip_output.c:549
__tcp_transmit_skb+0x3c70/0x4890 net/ipv4/tcp_output.c:1462
tcp_transmit_skb net/ipv4/tcp_output.c:1480 [inline]
tcp_write_xmit+0x3ee1/0x8900 net/ipv4/tcp_output.c:2792
__tcp_push_pending_frames net/ipv4/tcp_output.c:2977 [inline]
tcp_send_fin+0xa90/0x12e0 net/ipv4/tcp_output.c:3578
tcp_shutdown+0x198/0x1f0 net/ipv4/tcp.c:2716
inet_shutdown+0x33f/0x5b0 net/ipv4/af_inet.c:923
__sys_shutdown_sock net/socket.c:2425 [inline]
__sys_shutdown net/socket.c:2437 [inline]
__do_sys_shutdown net/socket.c:2445 [inline]
__se_sys_shutdown+0x2a4/0x440 net/socket.c:2443
__x64_sys_shutdown+0x6c/0xa0 net/socket.c:2443
do_syscall_64+0xd5/0x1f0
entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was stored to memory at:
reqsk_alloc include/net/request_sock.h:148 [inline]
inet_reqsk_alloc+0x651/0x7a0 net/ipv4/tcp_input.c:6978
cookie_tcp_reqsk_alloc+0xd4/0x900 net/ipv4/syncookies.c:328
cookie_tcp_check net/ipv4/syncookies.c:388 [inline]
cookie_v4_check+0x289f/0x29e0 net/ipv4/syncookies.c:420
tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1855 [inline]
tcp_v4_do_rcv+0xb17/0x10b0 net/ipv4/tcp_ipv4.c:1914
tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:460 [inline]
ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:449
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:569
__netif_receive_skb_one_core net/core/dev.c:5538 [inline]
__netif_receive_skb+0x319/0x9e0 net/core/dev.c:5652
process_backlog+0x480/0x8b0 net/core/dev.c:5981
__napi_poll+0xe7/0x980 net/core/dev.c:6632
napi_poll net/core/dev.c:6701 [inline]
net_rx_action+0x89d/0x1820 net/core/dev.c:6813
__do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
Uninit was created at:
__alloc_pages+0x9a7/0xe00 mm/page_alloc.c:4592
__alloc_pages_node include/linux/gfp.h:238 [inline]
alloc_pages_node include/linux/gfp.h:261 [inline]
alloc_slab_page mm/slub.c:2175 [inline]
allocate_slab mm/slub.c:2338 [inline]
new_slab+0x2de/0x1400 mm/slub.c:2391
___slab_alloc+0x1184/0x33d0 mm/slub.c:3525
__slab_alloc mm/slub.c:3610 [inline]
__slab_alloc_node mm/slub.c:3663 [inline]
slab_alloc_node mm/slub.c:3835 [inline]
kmem_cache_alloc+0x6d3/0xbe0 mm/slub.c:3852
reqsk_alloc include/net/request_sock.h:131 [inline]
inet_reqsk_alloc+0x66/0x7a0 net/ipv4/tcp_input.c:6978
tcp_conn_request+0x484/0x44e0 net/ipv4/tcp_input.c:7135
tcp_v4_conn_request+0x16f/0x1d0 net/ipv4/tcp_ipv4.c:1716
tcp_rcv_state_process+0x2e5/0x4bb0 net/ipv4/tcp_input.c:6655
tcp_v4_do_rcv+0xbfd/0x10b0 net/ipv4/tcp_ipv4.c:1929
tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322
ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:460 [inline]
ip_sublist_rcv_finish net/ipv4/ip_input.c:580 [inline]
ip_list_rcv_finish net/ipv4/ip_input.c:631 [inline]
ip_sublist_rcv+0x15f3/0x17f0 net/ipv4/ip_input.c:639
ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:674
__netif_receive_skb_list_ptype net/core/dev.c:5581 [inline]
__netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5629
__netif_receive_skb_list net/core/dev.c:5681 [inline]
netif_receive_skb_list_internal+0x106c/0x16f0 net/core/dev.c:5773
gro_normal_list include/net/gro.h:438 [inline]
napi_complete_done+0x425/0x880 net/core/dev.c:6113
virtqueue_napi_complete drivers/net/virtio_net.c:465 [inline]
virtnet_poll+0x149d/0x2240 drivers/net/virtio_net.c:2211
__napi_poll+0xe7/0x980 net/core/dev.c:6632
napi_poll net/core/dev.c:6701 [inline]
net_rx_action+0x89d/0x1820 net/core/dev.c:6813
__do_softirq+0x1c0/0x7d7 kernel/softirq.c:554
CPU: 0 PID: 16792 Comm: syz-executor.2 Not tainted 6.8.0-syzkaller-05562-g61387b8dcf1d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
Fixes: 695751e31a63 ("bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Closes: https://lore.kernel.org/bpf/CANn89iKdN9c+C_2JAUbc+VY3DDQjAQukMtiBbormAmAk9CdvQA@mail.gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240315224710.55209-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Now it's time to use the prepared definitions to refine this part.
Four reasons used might enough for now, I think.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Only move the skb drop from tcp_v4_do_rcv() to cookie_v4_check() itself,
no other changes made. It can help us refine the specific drop reasons
later.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We will support arbitrary SYN Cookie with BPF in the following
patch.
If BPF prog validates ACK and kfunc allocates a reqsk, it will
be carried to cookie_[46]_check() as skb->sk. If skb->sk is not
NULL, we call cookie_bpf_check().
Then, we clear skb->sk and skb->destructor, which are needed not
to hold refcnt for reqsk and the listener. See the following patch
for details.
After that, we finish initialisation for the remaining fields with
cookie_tcp_reqsk_init().
Note that the server side WScale is set only for non-BPF SYN Cookie.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240115205514.68364-5-kuniyu@amazon.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We will support arbitrary SYN Cookie with BPF.
When BPF prog validates ACK and kfunc allocates a reqsk, we need
to call tcp_ns_to_ts() to calculate an offset of TSval for later
use:
time
t0 : Send SYN+ACK
-> tsval = Initial TSval (Random Number)
t1 : Recv ACK of 3WHS
-> tsoff = TSecr - tcp_ns_to_ts(usec_ts_ok, tcp_clock_ns())
= Initial TSval - t1
t2 : Send ACK
-> tsval = t2 + tsoff
= Initial TSval + (t2 - t1)
= Initial TSval + Time Delta (x)
(x) Note that the time delta does not include the initial RTT
from t0 to t1.
Let's move tcp_ns_to_ts() to tcp.h.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240115205514.68364-2-kuniyu@amazon.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We will support arbitrary SYN Cookie with BPF, and then kfunc at
TC will preallocate reqsk and initialise some fields that should
not be overwritten later by cookie_v[46]_check().
To simplify the flow in cookie_v[46]_check(), we move such fields'
initialisation to cookie_tcp_reqsk_alloc() and factorise non-BPF
SYN Cookie handling into cookie_tcp_check(), where we validate the
cookie and allocate reqsk, as done by kfunc later.
Note that we set ireq->ecn_ok in two steps, the latter of which will
be shared by the BPF case. As cookie_ecn_ok() is one-liner, now
it's inlined.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231129022924.96156-9-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We will support arbitrary SYN Cookie with BPF, and then some reqsk fields
are initialised in kfunc, and others are done in cookie_v[46]_check().
This patch factorises the common part as cookie_tcp_reqsk_init() and
calls it in cookie_tcp_reqsk_alloc() to minimise the discrepancy between
cookie_v[46]_check().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231129022924.96156-8-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We initialise treq->af_specific in cookie_tcp_reqsk_alloc() so that
we can look up a key later in tcp_create_openreq_child().
Initially, that change was added for MD5 by commit ba5a4fdd63ae ("tcp:
make sure treq->af_specific is initialized"), but it has not been used
since commit d0f2b7a9ca0a ("tcp: Disable header prediction for MD5
flow.").
Now, treq->af_specific is used only by TCP-AO, so, we can move that
initialisation into tcp_ao_syncookie().
In addition to that, l3index in cookie_v[46]_check() is only used for
tcp_ao_syncookie(), so let's move it as well.
While at it, we move down tcp_ao_syncookie() in cookie_v4_check() so
that it will be called after security_inet_conn_request() to make
functions order consistent with cookie_v6_check().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231129022924.96156-7-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we create a full socket from SYN Cookie, we initialise
tcp_sk(sk)->tsoffset redundantly in tcp_get_cookie_sock() as
the field is inherited from tcp_rsk(req)->ts_off.
cookie_v[46]_check
|- treq->ts_off = 0
`- tcp_get_cookie_sock
|- tcp_v[46]_syn_recv_sock
| `- tcp_create_openreq_child
| `- newtp->tsoffset = treq->ts_off
`- tcp_sk(child)->tsoffset = tsoff
Let's initialise tcp_rsk(req)->ts_off with the correct offset
and remove the second initialisation of tcp_sk(sk)->tsoffset.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231129022924.96156-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tcp_hdr(skb) and SYN Cookie are passed to __cookie_v[46]_check(), but
none of the callers passes cookie other than ntohl(th->ack_seq) - 1.
Let's fetch it in __cookie_v[46]_check() instead of passing the cookie
over and over.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231129022924.96156-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We will support arbitrary SYN Cookie with BPF, and then reqsk
will be preallocated before cookie_v[46]_check().
Depending on how validation fails, we send RST or just drop skb.
To make the error handling easier, let's clean up goto labels.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231129022924.96156-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
| |
sock_net(sk) is used repeatedly in cookie_v[46]_check().
Let's cache it in a variable.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231129022924.96156-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
| |
We will grow and cut the xmas tree in cookie_v[46]_check().
This patch cleans it up to make later patches tidy.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231129022924.96156-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After blamed commit, TFO client-ack-dropped-then-recovery-ms-timestamps
packetdrill test failed.
David Morley and Neal Cardwell started investigating and Neal pointed
that we had :
tcp_conn_request()
tcp_try_fastopen()
-> tcp_fastopen_create_child
-> child = inet_csk(sk)->icsk_af_ops->syn_recv_sock()
-> tcp_create_openreq_child()
-> copy req_usec_ts from req:
newtp->tcp_usec_ts = treq->req_usec_ts;
// now the new TFO server socket always does usec TS, no matter
// what the route options are...
send_synack()
-> tcp_make_synack()
// disable tcp_rsk(req)->req_usec_ts if route option is not present:
if (tcp_rsk(req)->req_usec_ts < 0)
tcp_rsk(req)->req_usec_ts = dst_tcp_usec_ts(dst);
tcp_conn_request() has the initial dst, we can initialize
tcp_rsk(req)->req_usec_ts there instead of later in send_synack();
This means tcp_rsk(req)->req_usec_ts can be a boolean.
Many thanks to David an Neal for their help.
Fixes: 614e8316aa4c ("tcp: add support for usec resolution in TCP TS values")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202310302216.f79d78bc-oliver.sang@intel.com
Suggested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Morley <morleyd@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similarly how TCP_MD5SIG_FLAG_IFINDEX works for TCP-MD5,
TCP_AO_KEYF_IFINDEX is an AO-key flag that binds that MKT to a specified
by L3 ifinndex. Similarly, without this flag the key will work in
the default VRF l3index = 0 for connections.
To prevent AO-keys from overlapping, it's restricted to add key B for a
socket that has key A, which have the same sndid/rcvid and one of
the following is true:
- !(A.keyflags & TCP_AO_KEYF_IFINDEX) or !(B.keyflags & TCP_AO_KEYF_IFINDEX)
so that any key is non-bound to a VRF
- A.l3index == B.l3index
both want to work for the same VRF
Additionally, it's restricted to match TCP-MD5 keys for the same peer
the following way:
|--------------|--------------------|----------------|---------------|
| | MD5 key without | MD5 key | MD5 key |
| | l3index | l3index=0 | l3index=N |
|--------------|--------------------|----------------|---------------|
| TCP-AO key | | | |
| without | reject | reject | reject |
| l3index | | | |
|--------------|--------------------|----------------|---------------|
| TCP-AO key | | | |
| l3index=0 | reject | reject | allow |
|--------------|--------------------|----------------|---------------|
| TCP-AO key | | | |
| l3index=N | reject | allow | reject |
|--------------|--------------------|----------------|---------------|
This is done with the help of tcp_md5_do_lookup_any_l3index() to reject
adding AO key without TCP_AO_KEYF_IFINDEX if there's TCP-MD5 in any VRF.
This is important for case where sysctl_tcp_l3mdev_accept = 1
Similarly, for TCP-AO lookups tcp_ao_do_lookup() may be used with
l3index < 0, so that __tcp_ao_key_cmp() will match TCP-AO key in any VRF.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now when the new request socket is created from the listening socket,
it's recorded what MKT was used by the peer. tcp_rsk_used_ao() is
a new helper for checking if TCP-AO option was used to create the
request socket.
tcp_ao_copy_all_matching() will copy all keys that match the peer on the
request socket, as well as preparing them for the usage (creating
traffic keys).
Co-developed-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Co-developed-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Back in 2015, Van Jacobson suggested to use usec resolution in TCP TS values.
This has been implemented in our private kernels.
Goals were :
1) better observability of delays in networking stacks.
2) better disambiguation of events based on TSval/ecr values.
3) building block for congestion control modules needing usec resolution.
Back then we implemented a schem based on private SYN options
to negotiate the feature.
For upstream submission, we chose to use a route attribute,
because this feature is probably going to be used in private
networks [1] [2].
ip route add 10/8 ... features tcp_usec_ts
Note that RFC 7323 recommends a
"timestamp clock frequency in the range 1 ms to 1 sec per tick.",
but also mentions
"the maximum acceptable clock frequency is one tick every 59 ns."
[1] Unfortunately RFC 7323 5.5 (Outdated Timestamps) suggests
to invalidate TS.Recent values after a flow was idle for more
than 24 days. This is the part making usec_ts a problem
for peers following this recommendation for long living
idle flows.
[2] Attempts to standardize usec ts went nowhere:
https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tcp-options-for-low-latency-00.pdf
https://datatracker.ietf.org/doc/draft-wang-tcpm-low-latency-opt/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
tcp_ns_to_ts() is only used once from cookie_init_timestamp().
Also add the 'bool usec_ts' parameter to enable usec TS later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cookie_init_timestamp() is supposed to return a 64bit timestamp
suitable for both TSval determination and setting of skb->tstamp.
Unfortunately it uses 32bit fields and overflows after
2^32 * 10^6 nsec (~49 days) of uptime.
Generated TSval are still correct, but skb->tstamp might be set
far away in the past, potentially confusing other layers.
tcp_ns_to_ts() is changed to return a full 64bit value,
ts and ts_now variables are changed to u64 type,
and TSMASK is removed in favor of shifts operations.
While we are at it, change this sequence:
ts >>= TSBITS;
ts--;
ts <<= TSBITS;
ts |= options;
to:
ts -= (1UL << TSBITS);
Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RT_CONN_FLAGS(sk) overloads flowi4_tos with the RTO_ONLINK bit when
sk has the SOCK_LOCALROUTE flag set. This allows
ip_route_output_key_hash() to eventually adjust flowi4_scope.
Instead of relying on special handling of the RTO_ONLINK bit, we can
just set the route scope correctly. This will eventually allow to avoid
special interpretation of tos variables and to convert ->flowi4_tos to
dscp_t.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To ease the maintenance, it is often recommended to avoid having #ifdef
preprocessor conditions.
Here the section related to CONFIG_MPTCP was quite short but the next
commit needs to add more code around. It is then cleaner to move
specific MPTCP code to functions located in net/mptcp directory.
Now that mptcp_subflow_request_sock_ops structure can be static, it can
also be marked as "read only after init".
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While reading these knobs, they can be changed concurrently.
Thus, we need to add READ_ONCE() to their readers.
- tcp_sack
- tcp_window_scaling
- tcp_timestamps
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
While reading sysctl_tcp_syncookies, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
While reading sysctl_tcp_ecn, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
syzbot complained about a recent change in TCP stack,
hitting a NULL pointer [1]
tcp request sockets have an af_specific pointer, which
was used before the blamed change only for SYNACK generation
in non SYNCOOKIE mode.
tcp requests sockets momentarily created when third packet
coming from client in SYNCOOKIE mode were not using
treq->af_specific.
Make sure this field is populated, in the same way normal
TCP requests sockets do in tcp_conn_request().
[1]
TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending cookies. Check SNMP counters.
general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 1 PID: 3695 Comm: syz-executor864 Not tainted 5.18.0-rc3-syzkaller-00224-g5fd1fe4807f9 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:tcp_create_openreq_child+0xe16/0x16b0 net/ipv4/tcp_minisocks.c:534
Code: 48 c1 ea 03 80 3c 02 00 0f 85 e5 07 00 00 4c 8b b3 28 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7e 08 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 c9 07 00 00 48 8b 3c 24 48 89 de 41 ff 56 08 48
RSP: 0018:ffffc90000de0588 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff888076490330 RCX: 0000000000000100
RDX: 0000000000000001 RSI: ffffffff87d67ff0 RDI: 0000000000000008
RBP: ffff88806ee1c7f8 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff87d67f00 R11: 0000000000000000 R12: ffff88806ee1bfc0
R13: ffff88801b0e0368 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f517fe58700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffcead76960 CR3: 000000006f97b000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
tcp_v6_syn_recv_sock+0x199/0x23b0 net/ipv6/tcp_ipv6.c:1267
tcp_get_cookie_sock+0xc9/0x850 net/ipv4/syncookies.c:207
cookie_v6_check+0x15c3/0x2340 net/ipv6/syncookies.c:258
tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1131 [inline]
tcp_v6_do_rcv+0x1148/0x13b0 net/ipv6/tcp_ipv6.c:1486
tcp_v6_rcv+0x3305/0x3840 net/ipv6/tcp_ipv6.c:1725
ip6_protocol_deliver_rcu+0x2e9/0x1900 net/ipv6/ip6_input.c:422
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:464
NF_HOOK include/linux/netfilter.h:307 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:473
dst_input include/net/dst.h:461 [inline]
ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
ipv6_rcv+0x27f/0x3b0 net/ipv6/ip6_input.c:297
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5405
__netif_receive_skb+0x24/0x1b0 net/core/dev.c:5519
process_backlog+0x3a0/0x7c0 net/core/dev.c:5847
__napi_poll+0xb3/0x6e0 net/core/dev.c:6413
napi_poll net/core/dev.c:6480 [inline]
net_rx_action+0x8ec/0xc60 net/core/dev.c:6567
__do_softirq+0x29b/0x9c2 kernel/softirq.c:558
invoke_softirq kernel/softirq.c:432 [inline]
__irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1097
Fixes: 5b0b9e4c2c89 ("tcp: md5: incorrect tcp_header_len for incoming connections")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
| |
siphash keys use 16 bytes.
Define siphash_aligned_key_t macro so that we can make sure they
are not crossing a cache line boundary.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
| |
syncookies.c hasn't use any macro or function declared in slab.h and random.h,
Thus, these files can be removed from syncookies.c safely without
affecting the compilation of the net module.
Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"While we have a small number of SELinux patches for v5.11, there are a
few changes worth highlighting:
- Change the LSM network hooks to pass flowi_common structs instead
of the parent flowi struct as the LSMs do not currently need the
full flowi struct and they do not have enough information to use it
safely (missing information on the address family).
This patch was discussed both with Herbert Xu (representing team
netdev) and James Morris (representing team
LSMs-other-than-SELinux).
- Fix how we handle errors in inode_doinit_with_dentry() so that we
attempt to properly label the inode on following lookups instead of
continuing to treat it as unlabeled.
- Tweak the kernel logic around allowx, auditallowx, and dontauditx
SELinux policy statements such that the auditx/dontauditx are
effective even without the allowx statement.
Everything passes our test suite"
* tag 'selinux-pr-20201214' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
lsm,selinux: pass flowi_common instead of flowi to the LSM hooks
selinux: Fix fall-through warnings for Clang
selinux: drop super_block backpointer from superblock_security_struct
selinux: fix inode_doinit_with_dentry() LABEL_INVALID error handling
selinux: allow dontauditx and auditallowx rules to take effect without allowx
selinux: fix error initialization in inode_doinit_with_dentry()
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As pointed out by Herbert in a recent related patch, the LSM hooks do
not have the necessary address family information to use the flowi
struct safely. As none of the LSMs currently use any of the protocol
specific flowi information, replace the flowi pointers with pointers
to the address family independent flowi_common struct.
Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When net.ipv4.tcp_syncookies=1 and syn flood is happened,
cookie_v4_check or cookie_v6_check tries to redo what
tcp_v4_send_synack or tcp_v6_send_synack did,
rsk_window_clamp will be changed if SOCK_RCVBUF is set,
which will make rcv_wscale is different, the client
still operates with initial window scale and can overshot
granted window, the client use the initial scale but local
server use new scale to advertise window value, and session
work abnormally.
Fixes: e88c64f0a425 ("tcp: allow effective reduction of TCP's rcv-buffer via setsockopt")
Signed-off-by: Mao Wenan <wenan.mao@linux.alibaba.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1604967391-123737-1-git-send-email-wenan.mao@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Rejecting non-native endian BTF overlapped with the addition
of support for it.
The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If a syn-cookies request socket don't pass MPTCP-level
validation done in syn_recv_sock(), we need to release
it immediately, or it will be leaked.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/89
Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in use")
Reported-and-tested-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
A new field is added to the request sock to record the TOS value
received on the listening socket during 3WHS:
When not under syn flood, it is recording the TOS value sent in SYN.
When under syn flood, it is recording the TOS value sent in the ACK.
This is a preparation patch in order to do TOS reflection in the later
commit.
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes these errors:
net/ipv4/syncookies.c: In function 'tcp_get_cookie_sock':
net/ipv4/syncookies.c:216:19: error: 'struct tcp_request_sock' has no
member named 'drop_req'
216 | if (tcp_rsk(req)->drop_req) {
| ^~
net/ipv4/syncookies.c: In function 'cookie_tcp_reqsk_alloc':
net/ipv4/syncookies.c:289:27: warning: unused variable 'treq'
[-Wunused-variable]
289 | struct tcp_request_sock *treq;
| ^~~~
make[3]: *** [scripts/Makefile.build:280: net/ipv4/syncookies.o] Error 1
make[3]: *** Waiting for unfinished jobs....
Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in use")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
JOIN requests do not work in syncookie mode -- for HMAC validation, the
peers nonce and the mptcp token (to obtain the desired connection socket
the join is for) are required, but this information is only present in the
initial syn.
So either we need to drop all JOIN requests once a listening socket enters
syncookie mode, or we need to store enough state to reconstruct the request
socket later.
This adds a state table (1024 entries) to store the data present in the
MP_JOIN syn request and the random nonce used for the cookie syn/ack.
When a MP_JOIN ACK passed cookie validation, the table is consulted
to rebuild the request socket from it.
An alternate approach would be to "cancel" syn-cookie mode and force
MP_JOIN to always use a syn queue entry.
However, doing so brings the backlog over the configured queue limit.
v2: use req->syncookie, not (removed) want_cookie arg
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If SYN packet contains MP_CAPABLE option, keep it enabled.
Syncokie validation and cookie-based socket creation is changed to
instantiate an mptcp request sockets if the ACK contains an MPTCP
connection request.
Rather than extend both cookie_v4/6_check, add a common helper to create
the (mp)tcp request socket.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can't deal with syncookie mode yet, the syncookie rx path will create
tcp reqsk, i.e. we get OOB access because we treat tcp reqsk as mptcp reqsk one:
TCP: SYN flooding on port 20002. Sending cookies.
BUG: KASAN: slab-out-of-bounds in subflow_syn_recv_sock+0x451/0x4d0 net/mptcp/subflow.c:191
Read of size 1 at addr ffff8881167bc148 by task syz-executor099/2120
subflow_syn_recv_sock+0x451/0x4d0 net/mptcp/subflow.c:191
tcp_get_cookie_sock+0xcf/0x520 net/ipv4/syncookies.c:209
cookie_v6_check+0x15a5/0x1e90 net/ipv6/syncookies.c:252
tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1123 [inline]
[..]
Bug can be reproduced via "sysctl net.ipv4.tcp_syncookies=2".
Note that MPTCP should work with syncookies (4th ack would carry needed
state), but it appears better to sort that out in -next so do tcp
fallback for now.
I removed the MPTCP ifdef for tcp_rsk "is_mptcp" member because
if (IS_ENABLED()) is easier to read than "#ifdef IS_ENABLED()/#endif" pair.
Cc: Eric Dumazet <edumazet@google.com>
Fixes: cec37a6e41aae7bf ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
tcp_make_synack() already uses tcp_clock_ns(), and can pass
the value to cookie_init_timestamp() to avoid another call
to ktime_get_ns() helper.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 3029 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the request socket is created locally, it'd make more sense to
use reqsk_free() instead of reqsk_put() in TFO and syncookies' error
path.
However, tcp_get_cookie_sock() may set ->rsk_refcnt before freeing the
socket; tcp_conn_request() may also have non-null ->rsk_refcnt because
of tcp_try_fastopen(). In both cases 'req' hasn't been exposed
to the outside world and is safe to free immediately, but that'd
trigger the WARN_ON_ONCE in reqsk_free().
Define __reqsk_free() for these situations where we know nobody's
referencing the socket, even though ->rsk_refcnt might be non-null.
Now we can consolidate the error path of tcp_get_cookie_sock() and
tcp_conn_request().
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 7716682cc58e ("tcp/dccp: fix another race at listener
dismantle") let inet_csk_reqsk_queue_add() fail, and adjusted
{tcp,dccp}_check_req() accordingly. However, TFO and syncookies
weren't modified, thus leaking allocated resources on error.
Contrary to tcp_check_req(), in both syncookies and TFO cases,
we need to drop the request socket. Also, since the child socket is
created with inet_csk_clone_lock(), we have to unlock it and drop an
extra reference (->sk_refcount is initially set to 2 and
inet_csk_reqsk_queue_add() drops only one ref).
For TFO, we also need to revert the work done by tcp_try_fastopen()
(with reqsk_fastopen_remove()).
Fixes: 7716682cc58e ("tcp/dccp: fix another race at listener dismantle")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Switch internal TCP skb->skb_mstamp to skb->skb_mstamp_ns,
from usec units to nsec units.
Do not clear skb->tstamp before entering IP stacks in TX,
so that qdisc or devices can implement pacing based on the
earliest departure time instead of socket sk->sk_pacing_rate
Packets are fed with tcp_wstamp_ns, and following patch
will update tcp_wstamp_ns when both TCP and sch_fq switch to
the earliest departure time mechanism.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the SMC experimental TCP option in a SYN packet is lost on
the server side when SYN Cookies are active. However, the corresponding
SYNACK sent back to the client contains the SMC option. This causes an
inconsistent view of the SMC capabilities on the client and server.
This patch disables the SMC option in the SYNACK when SYN Cookies are
active to avoid this issue.
Fixes: 60e2a7780793b ("tcp: TCP experimental option for SMC")
Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
| |
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
syzkaller found another bug in DCCP/TCP stacks [1]
For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix
ireq->pktopts race"), we need to make sure we do not access
ireq->opt unless we own the request sock.
Note the opt field is renamed to ireq_opt to ease grep games.
[1]
BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
Read of size 1 at addr ffff8801c951039c by task syz-executor5/3295
CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x25b/0x340 mm/kasan/report.c:409
__asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427
ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135
tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587
tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557
__tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072
tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline]
tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071
tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816
tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682
ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
dst_input include/net/dst.h:464 [inline]
ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
__netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
__netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
netif_receive_skb+0xae/0x390 net/core/dev.c:4611
tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
call_write_iter include/linux/fs.h:1770 [inline]
new_sync_write fs/read_write.c:468 [inline]
__vfs_write+0x68a/0x970 fs/read_write.c:481
vfs_write+0x18f/0x510 fs/read_write.c:543
SYSC_write fs/read_write.c:588 [inline]
SyS_write+0xef/0x220 fs/read_write.c:580
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x40c341
RSP: 002b:00007f469523ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 000000000040c341
RDX: 0000000000000037 RSI: 0000000020004000 RDI: 0000000000000015
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000f4240 R11: 0000000000000293 R12: 00000000004b7fd1
R13: 00000000ffffffff R14: 0000000020000000 R15: 0000000000025000
Allocated by task 3295:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
__do_kmalloc mm/slab.c:3725 [inline]
__kmalloc+0x162/0x760 mm/slab.c:3734
kmalloc include/linux/slab.h:498 [inline]
tcp_v4_save_options include/net/tcp.h:1962 [inline]
tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271
tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283
tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313
tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857
tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482
tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711
ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
dst_input include/net/dst.h:464 [inline]
ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
__netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
__netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
netif_receive_skb+0xae/0x390 net/core/dev.c:4611
tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
call_write_iter include/linux/fs.h:1770 [inline]
new_sync_write fs/read_write.c:468 [inline]
__vfs_write+0x68a/0x970 fs/read_write.c:481
vfs_write+0x18f/0x510 fs/read_write.c:543
SYSC_write fs/read_write.c:588 [inline]
SyS_write+0xef/0x220 fs/read_write.c:580
entry_SYSCALL_64_fastpath+0x1f/0xbe
Freed by task 3306:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
__cache_free mm/slab.c:3503 [inline]
kfree+0xca/0x250 mm/slab.c:3820
inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157
__sk_destruct+0xfd/0x910 net/core/sock.c:1560
sk_destruct+0x47/0x80 net/core/sock.c:1595
__sk_free+0x57/0x230 net/core/sock.c:1603
sk_free+0x2a/0x40 net/core/sock.c:1614
sock_put include/net/sock.h:1652 [inline]
inet_csk_complete_hashdance+0xd5/0xf0 net/ipv4/inet_connection_sock.c:959
tcp_check_req+0xf4d/0x1620 net/ipv4/tcp_minisocks.c:765
tcp_v4_rcv+0x17f6/0x2f80 net/ipv4/tcp_ipv4.c:1675
ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
dst_input include/net/dst.h:464 [inline]
ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
__netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
__netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
netif_receive_skb+0xae/0x390 net/core/dev.c:4611
tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
call_write_iter include/linux/fs.h:1770 [inline]
new_sync_write fs/read_write.c:468 [inline]
__vfs_write+0x68a/0x970 fs/read_write.c:481
vfs_write+0x18f/0x510 fs/read_write.c:543
SYSC_write fs/read_write.c:588 [inline]
SyS_write+0xef/0x220 fs/read_write.c:580
entry_SYSCALL_64_fastpath+0x1f/0xbe
Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
__ip_options_echo() uses the current network namespace, and
currently retrives it via skb->dst->dev.
This commit adds an explicit 'net' argument to __ip_options_echo()
and update all the call sites to provide it, usually via a simpler
sock_net().
After this change, __ip_options_echo() no more needs to access
skb->dst and we can drop a couple of hack to preserve such
info in the rx path.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
KMSAN reported use of uninitialized memory in skb_set_hash_from_sk(),
which originated from the TCP request socket created in
cookie_v6_check():
==================================================================
BUG: KMSAN: use of uninitialized memory in tcp_transmit_skb+0xf77/0x3ec0
CPU: 1 PID: 2949 Comm: syz-execprog Not tainted 4.11.0-rc5+ #2931
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
TCP: request_sock_TCPv6: Possible SYN flooding on port 20028. Sending cookies. Check SNMP counters.
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:16
dump_stack+0x172/0x1c0 lib/dump_stack.c:52
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
skb_set_hash_from_sk ./include/net/sock.h:2011
tcp_transmit_skb+0xf77/0x3ec0 net/ipv4/tcp_output.c:983
tcp_send_ack+0x75b/0x830 net/ipv4/tcp_output.c:3493
tcp_delack_timer_handler+0x9a6/0xb90 net/ipv4/tcp_timer.c:284
tcp_delack_timer+0x1b0/0x310 net/ipv4/tcp_timer.c:309
call_timer_fn+0x240/0x520 kernel/time/timer.c:1268
expire_timers kernel/time/timer.c:1307
__run_timers+0xc13/0xf10 kernel/time/timer.c:1601
run_timer_softirq+0x36/0xa0 kernel/time/timer.c:1614
__do_softirq+0x485/0x942 kernel/softirq.c:284
invoke_softirq kernel/softirq.c:364
irq_exit+0x1fa/0x230 kernel/softirq.c:405
exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:657
smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966
apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489
RIP: 0010:native_restore_fl ./arch/x86/include/asm/irqflags.h:36
RIP: 0010:arch_local_irq_restore ./arch/x86/include/asm/irqflags.h:77
RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440
RSP: 0018:ffff880024917cd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
RAX: 0000000000000246 RBX: ffff8800224c0000 RCX: 0000000000000005
RDX: 0000000000000004 RSI: ffff880000000000 RDI: ffffea0000b6d770
RBP: ffff880024917d58 R08: 0000000000000dd8 R09: 0000000000000004
R10: 0000160000000000 R11: 0000000000000000 R12: ffffffff85abf810
R13: ffff880024917dd8 R14: 0000000000000010 R15: ffffffff81cabde4
</IRQ>
poll_select_copy_remaining+0xac/0x6b0 fs/select.c:293
SYSC_select+0x4b4/0x4e0 fs/select.c:653
SyS_select+0x76/0xa0 fs/select.c:634
entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204
RIP: 0033:0x4597e7
RSP: 002b:000000c420037ee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000017
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004597e7
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 000000c420037ef0 R08: 000000c420037ee0 R09: 0000000000000059
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000042dc20
R13: 00000000000000f3 R14: 0000000000000030 R15: 0000000000000003
chained origin:
save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
kmsan_save_stack mm/kmsan/kmsan.c:317
kmsan_internal_chain_origin+0x12a/0x1f0 mm/kmsan/kmsan.c:547
__msan_store_shadow_origin_4+0xac/0x110 mm/kmsan/kmsan_instr.c:259
tcp_create_openreq_child+0x709/0x1ae0 net/ipv4/tcp_minisocks.c:472
tcp_v6_syn_recv_sock+0x7eb/0x2a30 net/ipv6/tcp_ipv6.c:1103
tcp_get_cookie_sock+0x136/0x5f0 net/ipv4/syncookies.c:212
cookie_v6_check+0x17a9/0x1b50 net/ipv6/syncookies.c:245
tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
NF_HOOK ./include/linux/netfilter.h:257
ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
dst_input ./include/net/dst.h:492
ip6_rcv_finish net/ipv6/ip6_input.c:69
NF_HOOK ./include/linux/netfilter.h:257
ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
__netif_receive_skb net/core/dev.c:4246
process_backlog+0x667/0xba0 net/core/dev.c:4866
napi_poll net/core/dev.c:5268
net_rx_action+0xc95/0x1590 net/core/dev.c:5333
__do_softirq+0x485/0x942 kernel/softirq.c:284
origin:
save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337
kmem_cache_alloc+0x1c2/0x1e0 mm/slub.c:2766
reqsk_alloc ./include/net/request_sock.h:87
inet_reqsk_alloc+0xa4/0x5b0 net/ipv4/tcp_input.c:6200
cookie_v6_check+0x4f4/0x1b50 net/ipv6/syncookies.c:169
tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
NF_HOOK ./include/linux/netfilter.h:257
ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
dst_input ./include/net/dst.h:492
ip6_rcv_finish net/ipv6/ip6_input.c:69
NF_HOOK ./include/linux/netfilter.h:257
ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
__netif_receive_skb net/core/dev.c:4246
process_backlog+0x667/0xba0 net/core/dev.c:4866
napi_poll net/core/dev.c:5268
net_rx_action+0xc95/0x1590 net/core/dev.c:5333
__do_softirq+0x485/0x942 kernel/softirq.c:284
==================================================================
Similar error is reported for cookie_v4_check().
Fixes: 58d607d3e52f ("tcp: provide skb->hash to synack packets")
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
This patch uses refcount_inc_not_zero() instead of
atomic_inc_not_zero_hint() due to absense of a _hint()
version of refcount API. If the hint() version must
be used, we might need to revisit API.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|