| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Some reads of inet->tos are racy.
Add needed READ_ONCE() annotations and convert IP_TOS option lockless.
v2: missing changes in include/net/route.h (David Ahern)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
inet->pmtudisc can be read locklessly.
Implement proper lockless reads and writes to inet->pmtudisc
ip_sock_set_mtu_discover() can now be called from arbitrary
contexts.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
| |
inet->mc_ttl can be read locklessly.
Implement proper lockless reads and writes to inet->mc_ttl
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
This field can be read or written without socket lock being held.
Add annotations to avoid load-store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
sk->sk_txrehash readers are already safe against
concurrent change of this field.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
| |
SO_MAX_PACING_RATE setsockopt() does not need to hold
the socket lock, because sk->sk_pacing_rate readers
can run fine if the value is changed by other threads,
after adding READ_ONCE() accessors.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
| |
SO_BUSY_POLL_BUDGET
Setting sk->sk_ll_usec, sk_prefer_busy_poll and sk_busy_poll_budget
do not require the socket lock, readers are lockless anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
This options can not be set and return -ENOPROTOOPT,
no need to acqure socket lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
sock->flags are atomic, no need to hold the socket lock
in sk_setsockopt() for SO_PASSCRED, SO_PASSPIDFD and SO_PASSSEC.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
This is a followup of 8bf43be799d4 ("net: annotate data-races
around sk->sk_priority").
sk->sk_priority can be read and written without holding the socket lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
do_execute_actions() function can be called recursively multiple
times while executing actions that require pipeline forking or
recirculations. It may also be re-entered multiple times if the packet
leaves openvswitch module and re-enters it through a different port.
Currently, there is a 256-byte array allocated on stack in this
function that is supposed to hold NSH header. Compilers tend to
pre-allocate that space right at the beginning of the function:
a88: 48 81 ec b0 01 00 00 sub $0x1b0,%rsp
NSH is not a very common protocol, but the space is allocated on every
recursive call or re-entry multiplying the wasted stack space.
Move the stack allocation to push_nsh() function that is only used
if NSH actions are actually present. push_nsh() is also a simple
function without a possibility for re-entry, so the stack is returned
right away.
With this change the preallocated space is reduced by 256 B per call:
b18: 48 81 ec b0 00 00 00 sub $0xb0,%rsp
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Eelco Chaudron echaudro@redhat.com
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
Core networking has opt-in atomic variant of dev->stats,
simply use DEV_STATS_INC(), DEV_STATS_ADD() and DEV_STATS_READ().
v2: removed @priv local var in l2tp_eth_dev_recv() (Simon)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FQ performs garbage collection at enqueue time, and only
if number of flows is above a given threshold, which
is hit after the qdisc has been used a bit.
Since an RB-tree traversal is needed to locate a flow,
it makes sense to perform gc all the time, to keep
rb-trees smaller.
This reduces by 50 % average storage costs in FQ,
and avoids 1 cache line miss at enqueue time when
fast path added in prior patch can not be used.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TCQ_F_CAN_BYPASS can be used by few qdiscs.
Idea is that if we queue a packet to an empty qdisc,
following dequeue() would pick it immediately.
FQ can not use the generic TCQ_F_CAN_BYPASS code,
because some additional checks need to be performed.
This patch adds a similar fast path to FQ.
Most of the time, qdisc is not throttled,
and many packets can avoid bringing/touching
at least four cache lines, and consuming 128bytes
of memory to store the state of a flow.
After this patch, netperf can send UDP packets about 13 % faster,
and pktgen goes 30 % faster (when FQ is in the way), on a fast NIC.
TCP traffic is also improved, thanks to a reduction of cache line misses.
I have measured a 5 % increase of throughput on a tcp_rr intensive workload.
tc -s -d qd sh dev eth1
...
qdisc fq 8004: parent 1:2 limit 10000p flow_limit 100p buckets 1024
orphan_mask 1023 quantum 3028b initial_quantum 15140b low_rate_threshold 550Kbit
refill_delay 40ms timer_slack 10us horizon 10s horizon_drop
Sent 5646784384 bytes 1985161 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
flows 122 (inactive 122 throttled 0)
gc 0 highprio 0 fastpath 659990 throttled 27762 latency 8.57us
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, when one fq qdisc has no more packets to send, it can still
have some flows stored in its RR lists (q->new_flows & q->old_flows)
This was a design choice, but what is a bit disturbing is that
the inactive_flows counter does not include the count of empty flows
in RR lists.
As next patch needs to know better if there are active flows,
this change makes inactive_flows exact.
Before the patch, following command on an empty qdisc could have returned:
lpaa17:~# tc -s -d qd sh dev eth1 | grep inactive
flows 1322 (inactive 1316 throttled 0)
flows 1330 (inactive 1325 throttled 0)
flows 1193 (inactive 1190 throttled 0)
flows 1208 (inactive 1202 throttled 0)
After the patch, we now have:
lpaa17:~# tc -s -d qd sh dev eth1 | grep inactive
flows 1322 (inactive 1322 throttled 0)
flows 1330 (inactive 1330 throttled 0)
flows 1193 (inactive 1193 throttled 0)
flows 1208 (inactive 1208 throttled 0)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
q->flows can be often modified, and q->timer_slack is read mostly.
Exchange the two fields, so that cache line countaining
quantum, initial_quantum, and other critical parameters
stay clean (read-mostly).
Move q->watchdog next to q->stat_throttled
Add comments explaining how the structure is split in
three different parts.
pahole output before the patch:
struct fq_sched_data {
struct fq_flow_head new_flows; /* 0 0x10 */
struct fq_flow_head old_flows; /* 0x10 0x10 */
struct rb_root delayed; /* 0x20 0x8 */
u64 time_next_delayed_flow; /* 0x28 0x8 */
u64 ktime_cache; /* 0x30 0x8 */
unsigned long unthrottle_latency_ns; /* 0x38 0x8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct fq_flow internal __attribute__((__aligned__(64))); /* 0x40 0x80 */
/* XXX last struct has 16 bytes of padding */
/* --- cacheline 3 boundary (192 bytes) --- */
u32 quantum; /* 0xc0 0x4 */
u32 initial_quantum; /* 0xc4 0x4 */
u32 flow_refill_delay; /* 0xc8 0x4 */
u32 flow_plimit; /* 0xcc 0x4 */
unsigned long flow_max_rate; /* 0xd0 0x8 */
u64 ce_threshold; /* 0xd8 0x8 */
u64 horizon; /* 0xe0 0x8 */
u32 orphan_mask; /* 0xe8 0x4 */
u32 low_rate_threshold; /* 0xec 0x4 */
struct rb_root * fq_root; /* 0xf0 0x8 */
u8 rate_enable; /* 0xf8 0x1 */
u8 fq_trees_log; /* 0xf9 0x1 */
u8 horizon_drop; /* 0xfa 0x1 */
/* XXX 1 byte hole, try to pack */
<bad> u32 flows; /* 0xfc 0x4 */
/* --- cacheline 4 boundary (256 bytes) --- */
u32 inactive_flows; /* 0x100 0x4 */
u32 throttled_flows; /* 0x104 0x4 */
u64 stat_gc_flows; /* 0x108 0x8 */
u64 stat_internal_packets; /* 0x110 0x8 */
u64 stat_throttled; /* 0x118 0x8 */
u64 stat_ce_mark; /* 0x120 0x8 */
u64 stat_horizon_drops; /* 0x128 0x8 */
u64 stat_horizon_caps; /* 0x130 0x8 */
u64 stat_flows_plimit; /* 0x138 0x8 */
/* --- cacheline 5 boundary (320 bytes) --- */
u64 stat_pkts_too_long; /* 0x140 0x8 */
u64 stat_allocation_errors; /* 0x148 0x8 */
<bad> u32 timer_slack; /* 0x150 0x4 */
/* XXX 4 bytes hole, try to pack */
struct qdisc_watchdog watchdog; /* 0x158 0x48 */
/* size: 448, cachelines: 7, members: 34 */
/* sum members: 411, holes: 2, sum holes: 5 */
/* padding: 32 */
/* paddings: 1, sum paddings: 16 */
/* forced alignments: 1 */
};
pahole output after the patch:
struct fq_sched_data {
struct fq_flow_head new_flows; /* 0 0x10 */
struct fq_flow_head old_flows; /* 0x10 0x10 */
struct rb_root delayed; /* 0x20 0x8 */
u64 time_next_delayed_flow; /* 0x28 0x8 */
u64 ktime_cache; /* 0x30 0x8 */
unsigned long unthrottle_latency_ns; /* 0x38 0x8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct fq_flow internal __attribute__((__aligned__(64))); /* 0x40 0x80 */
/* XXX last struct has 16 bytes of padding */
/* --- cacheline 3 boundary (192 bytes) --- */
u32 quantum; /* 0xc0 0x4 */
u32 initial_quantum; /* 0xc4 0x4 */
u32 flow_refill_delay; /* 0xc8 0x4 */
u32 flow_plimit; /* 0xcc 0x4 */
unsigned long flow_max_rate; /* 0xd0 0x8 */
u64 ce_threshold; /* 0xd8 0x8 */
u64 horizon; /* 0xe0 0x8 */
u32 orphan_mask; /* 0xe8 0x4 */
u32 low_rate_threshold; /* 0xec 0x4 */
struct rb_root * fq_root; /* 0xf0 0x8 */
u8 rate_enable; /* 0xf8 0x1 */
u8 fq_trees_log; /* 0xf9 0x1 */
u8 horizon_drop; /* 0xfa 0x1 */
/* XXX 1 byte hole, try to pack */
<good> u32 timer_slack; /* 0xfc 0x4 */
/* --- cacheline 4 boundary (256 bytes) --- */
<good> u32 flows; /* 0x100 0x4 */
u32 inactive_flows; /* 0x104 0x4 */
u32 throttled_flows; /* 0x108 0x4 */
/* XXX 4 bytes hole, try to pack */
u64 stat_throttled; /* 0x110 0x8 */
<better> struct qdisc_watchdog watchdog; /* 0x118 0x48 */
/* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
u64 stat_gc_flows; /* 0x160 0x8 */
u64 stat_internal_packets; /* 0x168 0x8 */
u64 stat_ce_mark; /* 0x170 0x8 */
u64 stat_horizon_drops; /* 0x178 0x8 */
/* --- cacheline 6 boundary (384 bytes) --- */
u64 stat_horizon_caps; /* 0x180 0x8 */
u64 stat_flows_plimit; /* 0x188 0x8 */
u64 stat_pkts_too_long; /* 0x190 0x8 */
u64 stat_allocation_errors; /* 0x198 0x8 */
/* Force padding: */
u64 :64;
u64 :64;
u64 :64;
u64 :64;
/* size: 448, cachelines: 7, members: 34 */
/* sum members: 411, holes: 2, sum holes: 5 */
/* padding: 32 */
/* paddings: 1, sum paddings: 16 */
/* forced alignments: 1 */
};
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While BPF allows to set icsk->->icsk_delack_max
and/or icsk->icsk_rto_min, we have an ip route
attribute (RTAX_RTO_MIN) to be able to tune rto_min,
but nothing to consequently adjust max delayed ack,
which vary from 40ms to 200 ms (TCP_DELACK_{MIN|MAX}).
This makes RTAX_RTO_MIN of almost no practical use,
unless customers are in big trouble.
Modern days datacenter communications want to set
rto_min to ~5 ms, and the max delayed ack one jiffie
smaller to avoid spurious retransmits.
After this patch, an "rto_min 5" route attribute will
effectively lower max delayed ack timers to 4 ms.
Note in the following ss output, "rto:6 ... ato:4"
$ ss -temoi dst XXXXXX
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 0 0 [2002:a05:6608:295::]:52950 [2002:a05:6608:297::]:41597
ino:255134 sk:1001 <->
skmem:(r0,rb1707063,t872,tb262144,f0,w0,o0,bl0,d0) ts sack
cubic wscale:8,8 rto:6 rtt:0.02/0.002 ato:4 mss:4096 pmtu:4500
rcvmss:536 advmss:4096 cwnd:10 bytes_sent:54823160 bytes_acked:54823121
bytes_received:54823120 segs_out:1370582 segs_in:1370580
data_segs_out:1370579 data_segs_in:1370578 send 16.4Gbps
pacing_rate 32.6Gbps delivery_rate 1.72Gbps delivered:1370579
busy:26920ms unacked:1 rcv_rtt:34.615 rcv_space:65920
rcv_ssthresh:65535 minrtt:0.015 snd_wnd:65536
While we could argue this patch fixes a bug with RTAX_RTO_MIN,
I do not add a Fixes: tag, so that we can soak it a bit before
asking backports to stable branches.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, skbs generated by pktgen always have their reference count
incremented before transmission, causing their reference count to be
always greater than 1, leading to two issues:
1. Only the code paths for shared skbs can be tested.
2. In certain situations, skbs can only be released by pktgen.
To enhance testing comprehensiveness, we are introducing the "SHARED"
flag to indicate whether an SKB is shared. This flag is enabled by
default, aligning with the current behavior. However, disabling this
flag allows skbs with a reference count of 1 to be transmitted.
So we can test non-shared skbs and code paths where skbs are released
within the stack.
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com>
Link: https://lore.kernel.org/r/20230920125658.46978-2-liangchen.linux@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When specifying an unknown flag, it will print all available flags.
Currently, these flags are provided as fixed strings, which requires
manual updates when flags change. Replacing it with automated flag
enumeration.
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Link: https://lore.kernel.org/r/20230920125658.46978-1-liangchen.linux@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|\
| |
| |
| |
| |
| |
| |
| | |
Cross-merge networking fixes after downstream PR.
No conflicts.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter and bpf.
Current release - regressions:
- bpf: adjust size_index according to the value of KMALLOC_MIN_SIZE
- netfilter: fix entries val in rule reset audit log
- eth: stmmac: fix incorrect rxq|txq_stats reference
Previous releases - regressions:
- ipv4: fix null-deref in ipv4_link_failure
- netfilter:
- fix several GC related issues
- fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
- eth: team: fix null-ptr-deref when team device type is changed
- eth: i40e: fix VF VLAN offloading when port VLAN is configured
- eth: ionic: fix 16bit math issue when PAGE_SIZE >= 64KB
Previous releases - always broken:
- core: fix ETH_P_1588 flow dissector
- mptcp: fix several connection hang-up conditions
- bpf:
- avoid deadlock when using queue and stack maps from NMI
- add override check to kprobe multi link attach
- hsr: properly parse HSRv1 supervisor frames.
- eth: igc: fix infinite initialization loop with early XDP redirect
- eth: octeon_ep: fix tx dma unmap len values in SG
- eth: hns3: fix GRE checksum offload issue"
* tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
sfc: handle error pointers returned by rhashtable_lookup_get_insert_fast()
igc: Expose tx-usecs coalesce setting to user
octeontx2-pf: Do xdp_do_flush() after redirects.
bnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI
net: ena: Flush XDP packets on error.
net/handshake: Fix memory leak in __sock_create() and sock_alloc_file()
net: hinic: Fix warning-hinic_set_vlan_fliter() warn: variable dereferenced before check 'hwdev'
netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
netfilter: nf_tables: fix memleak when more than 255 elements expired
netfilter: nf_tables: disable toggling dormant table state more than once
vxlan: Add missing entries to vxlan_get_size()
net: rds: Fix possible NULL-pointer dereference
team: fix null-ptr-deref when team device type is changed
net: bridge: use DEV_STATS_INC()
net: hns3: add 5ms delay before clear firmware reset irq source
net: hns3: fix fail to delete tc flower rules during reset issue
net: hns3: only enable unicast promisc when mac table full
net: hns3: fix GRE checksum offload issue
net: hns3: add cmdq check for vf periodic service task
net: stmmac: fix incorrect rxq|txq_stats reference
...
|
| | |\
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter updates for net
The following three patches fix regressions in the netfilter subsystem:
1. Reject attempts to repeatedly toggle the 'dormant' flag in a single
transaction. Doing so makes nf_tables lose track of the real state
vs. the desired state. This ends with an attempt to unregister hooks
that were never registered in the first place, which yields a splat.
2. Fix element counting in the new nftables garbage collection infra
that came with 6.5: More than 255 expired elements wraps a counter
which results in memory leak.
3. Since 6.4 ipset can BUG when a set is renamed while a CREATE command
is in progress, fix from Jozsef Kadlecsik.
* tag 'nf-23-09-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
netfilter: nf_tables: fix memleak when more than 255 elements expired
netfilter: nf_tables: disable toggling dormant table state more than once
====================
Link: https://lore.kernel.org/r/20230920084156.4192-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Kyle Zeng reported that there is a race between IPSET_CMD_ADD and IPSET_CMD_SWAP
in netfilter/ip_set, which can lead to the invocation of `__ip_set_put` on a
wrong `set`, triggering the `BUG_ON(set->ref == 0);` check in it.
The race is caused by using the wrong reference counter, i.e. the ref counter instead
of ref_netlink.
Fixes: 24e227896bbf ("netfilter: ipset: Add schedule point in call_ad().")
Reported-by: Kyle Zeng <zengyhkyle@gmail.com>
Closes: https://lore.kernel.org/netfilter-devel/ZPZqetxOmH+w%2Fmyc@westworld/#r
Tested-by: Kyle Zeng <zengyhkyle@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When more than 255 elements expired we're supposed to switch to a new gc
container structure.
This never happens: u8 type will wrap before reaching the boundary
and nft_trans_gc_space() always returns true.
This means we recycle the initial gc container structure and
lose track of the elements that came before.
While at it, don't deref 'gc' after we've passed it to call_rcu.
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
nft -f -<<EOF
add table ip t
add table ip t { flags dormant; }
add chain ip t c { type filter hook input priority 0; }
add table ip t
EOF
Triggers a splat from nf core on next table delete because we lose
track of right hook register state:
WARNING: CPU: 2 PID: 1597 at net/netfilter/core.c:501 __nf_unregister_net_hook
RIP: 0010:__nf_unregister_net_hook+0x41b/0x570
nf_unregister_net_hook+0xb4/0xf0
__nf_tables_unregister_hook+0x160/0x1d0
[..]
The above should have table in *active* state, but in fact no
hooks were registered.
Reject on/off/on games rather than attempting to fix this.
Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates")
Reported-by: "Lee, Cherie-Anne" <cherie.lee@starlabs.sg>
Cc: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Cc: info@starlabs.sg
Signed-off-by: Florian Westphal <fw@strlen.de>
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When making CONFIG_DEBUG_KMEMLEAK=y and CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y,
modprobe handshake-test and then rmmmod handshake-test, the below memory
leak is detected.
The struct socket_alloc which is allocated by alloc_inode_sb() in
__sock_create() is not freed. And the struct dentry which is allocated
by __d_alloc() in sock_alloc_file() is not freed.
Since fput() will call file->f_op->release() which is sock_close() here and
it will call __sock_release(). and fput() will call dput(dentry) to free
the struct dentry. So replace sock_release() with fput() to fix the
below memory leak. After applying this patch, the following memory leak is
never detected.
unreferenced object 0xffff888109165840 (size 768):
comm "kunit_try_catch", pid 1852, jiffies 4294685807 (age 976.262s)
hex dump (first 32 bytes):
01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ .......
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
[<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
[<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
[<ffffffff8397889c>] sock_alloc+0x3c/0x260
[<ffffffff83979b46>] __sock_create+0x66/0x3d0
[<ffffffffa0209ba2>] 0xffffffffa0209ba2
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810f472008 (size 192):
comm "kunit_try_catch", pid 1852, jiffies 4294685808 (age 976.261s)
hex dump (first 32 bytes):
00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............
00 00 00 00 00 00 00 00 08 20 47 0f 81 88 ff ff ......... G.....
backtrace:
[<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
[<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
[<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
[<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
[<ffffffffa0209bbb>] 0xffffffffa0209bbb
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810958e580 (size 224):
comm "kunit_try_catch", pid 1852, jiffies 4294685808 (age 976.261s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
[<ffffffff819d4cf9>] alloc_file+0x59/0x730
[<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
[<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
[<ffffffffa0209bbb>] 0xffffffffa0209bbb
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810926dc88 (size 192):
comm "kunit_try_catch", pid 1854, jiffies 4294685809 (age 976.271s)
hex dump (first 32 bytes):
00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............
00 00 00 00 00 00 00 00 88 dc 26 09 81 88 ff ff ..........&.....
backtrace:
[<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
[<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
[<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
[<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
[<ffffffffa0208fdc>] 0xffffffffa0208fdc
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810a241380 (size 224):
comm "kunit_try_catch", pid 1854, jiffies 4294685809 (age 976.271s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
[<ffffffff819d4cf9>] alloc_file+0x59/0x730
[<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
[<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
[<ffffffffa0208fdc>] 0xffffffffa0208fdc
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888109165040 (size 768):
comm "kunit_try_catch", pid 1856, jiffies 4294685811 (age 976.269s)
hex dump (first 32 bytes):
01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ .......
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
[<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
[<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
[<ffffffff8397889c>] sock_alloc+0x3c/0x260
[<ffffffff83979b46>] __sock_create+0x66/0x3d0
[<ffffffffa0208860>] 0xffffffffa0208860
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810926d568 (size 192):
comm "kunit_try_catch", pid 1856, jiffies 4294685811 (age 976.269s)
hex dump (first 32 bytes):
00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............
00 00 00 00 00 00 00 00 68 d5 26 09 81 88 ff ff ........h.&.....
backtrace:
[<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
[<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
[<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
[<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
[<ffffffffa0208879>] 0xffffffffa0208879
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810a240580 (size 224):
comm "kunit_try_catch", pid 1856, jiffies 4294685811 (age 976.347s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
[<ffffffff819d4cf9>] alloc_file+0x59/0x730
[<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
[<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
[<ffffffffa0208879>] 0xffffffffa0208879
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888109164c40 (size 768):
comm "kunit_try_catch", pid 1858, jiffies 4294685816 (age 976.342s)
hex dump (first 32 bytes):
01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ .......
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
[<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
[<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
[<ffffffff8397889c>] sock_alloc+0x3c/0x260
[<ffffffff83979b46>] __sock_create+0x66/0x3d0
[<ffffffffa0208541>] 0xffffffffa0208541
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810926cd18 (size 192):
comm "kunit_try_catch", pid 1858, jiffies 4294685816 (age 976.342s)
hex dump (first 32 bytes):
00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............
00 00 00 00 00 00 00 00 18 cd 26 09 81 88 ff ff ..........&.....
backtrace:
[<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
[<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
[<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
[<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
[<ffffffffa020855a>] 0xffffffffa020855a
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810a240200 (size 224):
comm "kunit_try_catch", pid 1858, jiffies 4294685816 (age 976.342s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
[<ffffffff819d4cf9>] alloc_file+0x59/0x730
[<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
[<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
[<ffffffffa020855a>] 0xffffffffa020855a
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888109164840 (size 768):
comm "kunit_try_catch", pid 1860, jiffies 4294685817 (age 976.416s)
hex dump (first 32 bytes):
01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ .......
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
[<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
[<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
[<ffffffff8397889c>] sock_alloc+0x3c/0x260
[<ffffffff83979b46>] __sock_create+0x66/0x3d0
[<ffffffffa02093e2>] 0xffffffffa02093e2
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810926cab8 (size 192):
comm "kunit_try_catch", pid 1860, jiffies 4294685817 (age 976.416s)
hex dump (first 32 bytes):
00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............
00 00 00 00 00 00 00 00 b8 ca 26 09 81 88 ff ff ..........&.....
backtrace:
[<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
[<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
[<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
[<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
[<ffffffffa02093fb>] 0xffffffffa02093fb
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810a240040 (size 224):
comm "kunit_try_catch", pid 1860, jiffies 4294685817 (age 976.416s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
[<ffffffff819d4cf9>] alloc_file+0x59/0x730
[<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
[<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
[<ffffffffa02093fb>] 0xffffffffa02093fb
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888109166440 (size 768):
comm "kunit_try_catch", pid 1862, jiffies 4294685819 (age 976.489s)
hex dump (first 32 bytes):
01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ .......
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
[<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
[<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
[<ffffffff8397889c>] sock_alloc+0x3c/0x260
[<ffffffff83979b46>] __sock_create+0x66/0x3d0
[<ffffffffa02097c1>] 0xffffffffa02097c1
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810926c398 (size 192):
comm "kunit_try_catch", pid 1862, jiffies 4294685819 (age 976.489s)
hex dump (first 32 bytes):
00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............
00 00 00 00 00 00 00 00 98 c3 26 09 81 88 ff ff ..........&.....
backtrace:
[<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
[<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
[<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
[<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
[<ffffffffa02097da>] 0xffffffffa02097da
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888107e0b8c0 (size 224):
comm "kunit_try_catch", pid 1862, jiffies 4294685819 (age 976.489s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
[<ffffffff819d4cf9>] alloc_file+0x59/0x730
[<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
[<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
[<ffffffffa02097da>] 0xffffffffa02097da
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888109164440 (size 768):
comm "kunit_try_catch", pid 1864, jiffies 4294685821 (age 976.487s)
hex dump (first 32 bytes):
01 00 00 00 01 00 5a 5a 20 00 00 00 00 00 00 00 ......ZZ .......
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8397993f>] sock_alloc_inode+0x1f/0x1b0
[<ffffffff81a2cb5b>] alloc_inode+0x5b/0x1a0
[<ffffffff81a32bed>] new_inode_pseudo+0xd/0x70
[<ffffffff8397889c>] sock_alloc+0x3c/0x260
[<ffffffff83979b46>] __sock_create+0x66/0x3d0
[<ffffffffa020824e>] 0xffffffffa020824e
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810f4cf698 (size 192):
comm "kunit_try_catch", pid 1864, jiffies 4294685821 (age 976.501s)
hex dump (first 32 bytes):
00 00 50 40 02 00 00 00 00 00 00 00 00 00 00 00 ..P@............
00 00 00 00 00 00 00 00 98 f6 4c 0f 81 88 ff ff ..........L.....
backtrace:
[<ffffffff81a1ff11>] __d_alloc+0x31/0x8a0
[<ffffffff81a2910e>] d_alloc_pseudo+0xe/0x50
[<ffffffff819d549e>] alloc_file_pseudo+0xce/0x210
[<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
[<ffffffffa0208267>] 0xffffffffa0208267
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888107e0b000 (size 224):
comm "kunit_try_catch", pid 1864, jiffies 4294685821 (age 976.501s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 03 00 2e 08 01 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff819d4b90>] alloc_empty_file+0x50/0x160
[<ffffffff819d4cf9>] alloc_file+0x59/0x730
[<ffffffff819d5524>] alloc_file_pseudo+0x154/0x210
[<ffffffff83978582>] sock_alloc_file+0x42/0x1b0
[<ffffffffa0208267>] 0xffffffffa0208267
[<ffffffff829cf03a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81236fc6>] kthread+0x2b6/0x380
[<ffffffff81096afd>] ret_from_fork+0x2d/0x70
[<ffffffff81003511>] ret_from_fork_asm+0x11/0x20
Fixes: 88232ec1ec5e ("net/handshake: Add Kunit tests for the handshake consumer API")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In rds_rdma_cm_event_handler_cmn() check, if conn pointer exists
before dereferencing it as rdma_set_service_type() argument
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: fd261ce6a30e ("rds: rdma: update rdma transport for tos")
Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
syzbot/KCSAN reported data-races in br_handle_frame_finish() [1]
This function can run from multiple cpus without mutual exclusion.
Adopt SMP safe DEV_STATS_INC() to update dev->stats fields.
Handles updates to dev->stats.tx_dropped while we are at it.
[1]
BUG: KCSAN: data-race in br_handle_frame_finish / br_handle_frame_finish
read-write to 0xffff8881374b2178 of 8 bytes by interrupt on cpu 1:
br_handle_frame_finish+0xd4f/0xef0 net/bridge/br_input.c:189
br_nf_hook_thresh+0x1ed/0x220
br_nf_pre_routing_finish_ipv6+0x50f/0x540
NF_HOOK include/linux/netfilter.h:304 [inline]
br_nf_pre_routing_ipv6+0x1e3/0x2a0 net/bridge/br_netfilter_ipv6.c:178
br_nf_pre_routing+0x526/0xba0 net/bridge/br_netfilter_hooks.c:508
nf_hook_entry_hookfn include/linux/netfilter.h:144 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:272 [inline]
br_handle_frame+0x4c9/0x940 net/bridge/br_input.c:417
__netif_receive_skb_core+0xa8a/0x21e0 net/core/dev.c:5417
__netif_receive_skb_one_core net/core/dev.c:5521 [inline]
__netif_receive_skb+0x57/0x1b0 net/core/dev.c:5637
process_backlog+0x21f/0x380 net/core/dev.c:5965
__napi_poll+0x60/0x3b0 net/core/dev.c:6527
napi_poll net/core/dev.c:6594 [inline]
net_rx_action+0x32b/0x750 net/core/dev.c:6727
__do_softirq+0xc1/0x265 kernel/softirq.c:553
run_ksoftirqd+0x17/0x20 kernel/softirq.c:921
smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
read-write to 0xffff8881374b2178 of 8 bytes by interrupt on cpu 0:
br_handle_frame_finish+0xd4f/0xef0 net/bridge/br_input.c:189
br_nf_hook_thresh+0x1ed/0x220
br_nf_pre_routing_finish_ipv6+0x50f/0x540
NF_HOOK include/linux/netfilter.h:304 [inline]
br_nf_pre_routing_ipv6+0x1e3/0x2a0 net/bridge/br_netfilter_ipv6.c:178
br_nf_pre_routing+0x526/0xba0 net/bridge/br_netfilter_hooks.c:508
nf_hook_entry_hookfn include/linux/netfilter.h:144 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:272 [inline]
br_handle_frame+0x4c9/0x940 net/bridge/br_input.c:417
__netif_receive_skb_core+0xa8a/0x21e0 net/core/dev.c:5417
__netif_receive_skb_one_core net/core/dev.c:5521 [inline]
__netif_receive_skb+0x57/0x1b0 net/core/dev.c:5637
process_backlog+0x21f/0x380 net/core/dev.c:5965
__napi_poll+0x60/0x3b0 net/core/dev.c:6527
napi_poll net/core/dev.c:6594 [inline]
net_rx_action+0x32b/0x750 net/core/dev.c:6727
__do_softirq+0xc1/0x265 kernel/softirq.c:553
do_softirq+0x5e/0x90 kernel/softirq.c:454
__local_bh_enable_ip+0x64/0x70 kernel/softirq.c:381
__raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline]
_raw_spin_unlock_bh+0x36/0x40 kernel/locking/spinlock.c:210
spin_unlock_bh include/linux/spinlock.h:396 [inline]
batadv_tt_local_purge+0x1a8/0x1f0 net/batman-adv/translation-table.c:1356
batadv_tt_purge+0x2b/0x630 net/batman-adv/translation-table.c:3560
process_one_work kernel/workqueue.c:2630 [inline]
process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703
worker_thread+0x525/0x730 kernel/workqueue.c:2784
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
value changed: 0x00000000000d7190 -> 0x00000000000d7191
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 14848 Comm: kworker/u4:11 Not tainted 6.6.0-rc1-syzkaller-00236-gad8a69f361b9 #0
Fixes: 1c29fc4989bc ("[BRIDGE]: keep track of received multicast packets")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: bridge@lists.linux-foundation.org
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230918091351.1356153-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
http://linux-ax25.org has been down for nearly a year. Its official
replacement is https://linux-ax25.in-berlin.de. Change all references to
the old site in the ax25 Kconfig to its replacement.
Link: https://marc.info/?m=166792551600315
Signed-off-by: Peter Lafreniere <peter@n8pjl.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
According to RFC 8684 section 3.3:
A connection is not closed unless [...] or an implementation-specific
connection-level send timeout.
Currently the MPTCP protocol does not implement such timeout, and
connection timing-out at the TCP-level never move to close state.
Introduces a catch-up condition at subflow close time to move the
MPTCP socket to close, too.
That additionally allows removing similar existing inside the worker.
Finally, allow some additional timeout for plain ESTABLISHED mptcp
sockets, as the protocol allows creating new subflows even at that
point and making the connection functional again.
This issue is actually present since the beginning, but it is basically
impossible to solve without a long chain of functional pre-requisites
topped by commit bbd49d114d57 ("mptcp: consolidate transition to
TCP_CLOSE in mptcp_do_fastclose()"). When backporting this current
patch, please also backport this other commit as well.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/430
Fixes: e16163b6e2b7 ("mptcp: refactor shutdown and close")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The msk socket uses to different timeout to track close related
events and retransmissions. The existing helpers do not indicate
clearly which timer they actually touch, making the related code
quite confusing.
Change the existing helpers name to avoid such confusion. No
functional change intended.
This patch is linked to the next one ("mptcp: fix dangling connection
hang-up"). The two patches are supposed to be backported together.
Cc: stable@vger.kernel.org # v5.11+
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
On incoming TCP reset, subflow closing could happen before error
propagation. That in turn could cause the socket error being ignored,
and a missing socket state transition, as reported by Daire-Byrne.
Address the issues explicitly checking for subflow socket error at
close time. To avoid code duplication, factor-out of __mptcp_error_report()
a new helper implementing the relevant bits.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/429
Fixes: 15cc10453398 ("mptcp: deliver ssk errors to msk")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This will simplify the next patch ("mptcp: process pending subflow error
on close").
No functional change intended.
Cc: stable@vger.kernel.org # v5.12+
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In case multiple subflows race to update the mptcp-level receive
window, the subflow losing the race should use the window value
provided by the "winning" subflow to update it's own tcp-level
rcv_wnd.
To such goal, the current code bogusly uses the mptcp-level rcv_wnd
value as observed before the update attempt. On unlucky circumstances
that may lead to TCP-level window shrinkage, and stall the other end.
Address the issue feeding to the rcv wnd update the correct value.
Fixes: f3589be0c420 ("mptcp: never shrink offered window")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/427
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Struct hsr_sup_tlv describes HW layout and therefore it needs a __packed
attribute to ensure the compiler does not add any padding.
Due to the size and __packed attribute of the structs that use
hsr_sup_tlv it has no functional impact.
Add __packed to struct hsr_sup_tlv.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
While adding support for parsing the redbox supervision frames, the
author added `pull_size' and `total_pull_size' to track the amount of
bytes that were pulled from the skb during while parsing the skb so it
can be reverted/ pushed back at the end.
In the process probably copy&paste error occurred and for the HSRv1 case
the ethhdr was used instead of the hsr_tag. Later the hsr_tag was used
instead of hsr_sup_tag. The later error didn't matter because both
structs have the size so HSRv0 was still working. It broke however HSRv1
parsing because struct ethhdr is larger than struct hsr_tag.
Reinstate the old pulling flow and pull first ethhdr, hsr_tag in v1 case
followed by hsr_sup_tag.
[bigeasy: commit message]
Fixes: eafaa88b3eb7 ("net: hsr: Add support for redbox supervision frames")'
Suggested-by: Tristram.Ha@microchip.com
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
dh->dccph_x is the 9th byte (offset 8) in "struct dccp_hdr",
not in the "byte 7" as Jann claimed.
We need to make sure the ICMP messages are big enough,
using more standard ways (no more assumptions).
syzbot reported:
BUG: KMSAN: uninit-value in pskb_may_pull_reason include/linux/skbuff.h:2667 [inline]
BUG: KMSAN: uninit-value in pskb_may_pull include/linux/skbuff.h:2681 [inline]
BUG: KMSAN: uninit-value in dccp_v6_err+0x426/0x1aa0 net/dccp/ipv6.c:94
pskb_may_pull_reason include/linux/skbuff.h:2667 [inline]
pskb_may_pull include/linux/skbuff.h:2681 [inline]
dccp_v6_err+0x426/0x1aa0 net/dccp/ipv6.c:94
icmpv6_notify+0x4c7/0x880 net/ipv6/icmp.c:867
icmpv6_rcv+0x19d5/0x30d0
ip6_protocol_deliver_rcu+0xda6/0x2a60 net/ipv6/ip6_input.c:438
ip6_input_finish net/ipv6/ip6_input.c:483 [inline]
NF_HOOK include/linux/netfilter.h:304 [inline]
ip6_input+0x15d/0x430 net/ipv6/ip6_input.c:492
ip6_mc_input+0xa7e/0xc80 net/ipv6/ip6_input.c:586
dst_input include/net/dst.h:468 [inline]
ip6_rcv_finish+0x5db/0x870 net/ipv6/ip6_input.c:79
NF_HOOK include/linux/netfilter.h:304 [inline]
ipv6_rcv+0xda/0x390 net/ipv6/ip6_input.c:310
__netif_receive_skb_one_core net/core/dev.c:5523 [inline]
__netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5637
netif_receive_skb_internal net/core/dev.c:5723 [inline]
netif_receive_skb+0x58/0x660 net/core/dev.c:5782
tun_rx_batched+0x83b/0x920
tun_get_user+0x564c/0x6940 drivers/net/tun.c:2002
tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
call_write_iter include/linux/fs.h:1985 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x8ef/0x15c0 fs/read_write.c:584
ksys_write+0x20f/0x4c0 fs/read_write.c:637
__do_sys_write fs/read_write.c:649 [inline]
__se_sys_write fs/read_write.c:646 [inline]
__x64_sys_write+0x93/0xd0 fs/read_write.c:646
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Uninit was created at:
slab_post_alloc_hook+0x12f/0xb70 mm/slab.h:767
slab_alloc_node mm/slub.c:3478 [inline]
kmem_cache_alloc_node+0x577/0xa80 mm/slub.c:3523
kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:559
__alloc_skb+0x318/0x740 net/core/skbuff.c:650
alloc_skb include/linux/skbuff.h:1286 [inline]
alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6313
sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2795
tun_alloc_skb drivers/net/tun.c:1531 [inline]
tun_get_user+0x23cf/0x6940 drivers/net/tun.c:1846
tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048
call_write_iter include/linux/fs.h:1985 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x8ef/0x15c0 fs/read_write.c:584
ksys_write+0x20f/0x4c0 fs/read_write.c:637
__do_sys_write fs/read_write.c:649 [inline]
__se_sys_write fs/read_write.c:646 [inline]
__x64_sys_write+0x93/0xd0 fs/read_write.c:646
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
CPU: 0 PID: 4995 Comm: syz-executor153 Not tainted 6.6.0-rc1-syzkaller-00014-ga747acc0b752 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Fixes: 977ad86c2a1b ("dccp: Fix out of bounds access in DCCP error handler")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jann Horn <jannh@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Report the carrier/no-carrier state for the network interface
shared between the BMC and the passthrough channel. Without this
functionality the BMC is unable to reconfigure the NIC in the event
of a re-cabling to a different subnet.
Signed-off-by: Johnathan Mantey <johnathanx.mantey@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently, we assume the skb is associated with a device before calling
__ip_options_compile, which is not always the case if it is re-routed by
ipvs.
When skb->dev is NULL, dev_net(skb->dev) will become null-dereference.
This patch adds a check for the edge case and switch to use the net_device
from the rtable when skb->dev is NULL.
Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure")
Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com>
Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
Cc: Vadim Fedorenko <vfedorenko@novek.ru>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Use bitmap_zalloc() and bitmap_free() instead of hand-writing them.
It is less verbose and it improves the type checking and semantic.
While at it, add missing header inclusion (should be bitops.h,
but with the above change it becomes bitmap.h).
Suggested-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230911154534.4174265-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | |\
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Alexei Starovoitov says:
====================
The following pull-request contains BPF updates for your *net* tree.
We've added 21 non-merge commits during the last 8 day(s) which contain
a total of 21 files changed, 450 insertions(+), 36 deletions(-).
The main changes are:
1) Adjust bpf_mem_alloc buckets to match ksize(), from Hou Tao.
2) Check whether override is allowed in kprobe mult, from Jiri Olsa.
3) Fix btf_id symbol generation with ld.lld, from Jiri and Nick.
4) Fix potential deadlock when using queue and stack maps from NMI, from Toke Høiland-Jørgensen.
Please consider pulling these changes from:
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
Thanks a lot!
Also thanks to reporters, reviewers and testers of commits in this pull-request:
Alan Maguire, Biju Das, Björn Töpel, Dan Carpenter, Daniel Borkmann,
Eduard Zingerman, Hsin-Wei Hung, Marcus Seyfarth, Nathan Chancellor,
Satya Durga Srinivasu Prabhala, Song Liu, Stephen Rothwell
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
bpf_nf testcase fails on s390x: bpf_skb_ct_lookup() cannot find the entry
that was added by bpf_ct_insert_entry() within the same BPF function.
The reason is that this entry is deleted by nf_ct_gc_expired().
The CT timeout starts ticking after the CT confirmation; therefore
nf_conn.timeout is initially set to the timeout value, and
__nf_conntrack_confirm() sets it to the deadline value.
bpf_ct_insert_entry() sets IPS_CONFIRMED_BIT, but does not adjust the
timeout, making its value meaningless and causing false positives.
Fix the problem by making bpf_ct_insert_entry() adjust the timeout,
like __nf_conntrack_confirm().
Fixes: 2cdaa3eefed8 ("netfilter: conntrack: restore IPS_CONFIRMED out of nf_conntrack_hash_check_insert()")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/bpf/20230830011128.1415752-3-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
| | |\ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
netfilter pull request 23-09-13
====================
The following patchset contains Netfilter fixes for net:
1) Do not permit to remove rules from chain binding, otherwise
double rule release is possible, triggering UaF. This rule
deletion support does not make sense and userspace does not use
this. Problem exists since the introduction of chain binding support.
2) rbtree GC worker only collects the elements that have expired.
This operation is not destructive, therefore, turn write into
read spinlock to avoid datapath contention due to GC worker run.
This was not fixed in the recent GC fix batch in the 6.5 cycle.
3) pipapo set backend performs sync GC, therefore, catchall elements
must use sync GC queue variant. This bug was introduced in the
6.5 cycle with the recent GC fixes.
4) Stop GC run if memory allocation fails in pipapo set backend,
otherwise access to NULL pointer to GC transaction object might
occur. This bug was introduced in the 6.5 cycle with the recent
GC fixes.
5) rhash GC run uses an iterator that might hit EAGAIN to rewind,
triggering double-collection of the same element. This bug was
introduced in the 6.5 cycle with the recent GC fixes.
6) Do not permit to remove elements in anonymous sets, this type of
sets are populated once and then bound to rules. This fix is
similar to the chain binding patch coming first in this batch.
API permits since the very beginning but it has no use case from
userspace.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The value in idx and the number of rules handled in that particular
__nf_tables_dump_rules() call is not identical. The former is a cursor
to pick up from if multiple netlink messages are needed, so its value is
ever increasing. Fixing this is not just a matter of subtracting s_idx
from it, though: When resetting rules in multiple chains,
__nf_tables_dump_rules() is called for each and cb->args[0] is not
adjusted in between. Introduce a dedicated counter to record the number
of rules reset in this call in a less confusing way.
While being at it, prevent the direct return upon buffer exhaustion: Any
rules previously dumped into that skb would evade audit logging
otherwise.
Fixes: 9b5ba5c9c5109 ("netfilter: nf_tables: Unbreak audit log reset")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The size table is incorrect due to copypaste error,
this reserves more size than needed.
TSTAMP reserved 32 instead of 16 bytes.
TIMEOUT reserved 16 instead of 8 bytes.
Fixes: 5f31edc0676b ("netfilter: conntrack: move extension sizes into core")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Anonymous sets need to be populated once at creation and then they are
bound to rule since 938154b93be8 ("netfilter: nf_tables: reject unbound
anonymous set before commit phase"), otherwise transaction reports
EINVAL.
Userspace does not need to delete elements of anonymous sets that are
not yet bound, reject this with EOPNOTSUPP.
From flush command path, skip anonymous sets, they are expected to be
bound already. Otherwise, EINVAL is hit at the end of this transaction
for unbound sets.
Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Skip GC run if iterator rewinds to the beginning with EAGAIN, otherwise GC
might collect the same element more than once.
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
nft_trans_gc_queue_sync() enqueues the GC transaction and it allocates a
new one. If this allocation fails, then stop this GC sync run and retry
later.
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
pipapo needs to enqueue GC transactions for catchall elements through
nft_trans_gc_queue_sync(). Add nft_trans_gc_catchall_sync() and
nft_trans_gc_catchall_async() to handle GC transaction queueing
accordingly.
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
rbtree GC does not modify the datastructure, instead it collects expired
elements and it enqueues a GC transaction. Use a read spinlock instead
to avoid data contention while GC worker is running.
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|