summaryrefslogtreecommitdiffstats
path: root/Documentation/admin-guide/perf (unfollow)
Commit message (Collapse)AuthorFilesLines
2020-01-27r8169: don't set min_mtu/max_mtu if not neededHeiner Kallweit1-7/+5
Defaults for min_mtu and max_mtu are set by ether_setup(), which is called from devm_alloc_etherdev(). Let rtl_jumbo_max() only return a positive value if actually jumbo packets are supported. This also allows to remove constant Jumbo_1K which is a little misleading anyway. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27mlxsw: minimal: Fix an error handling path in 'mlxsw_m_port_create()'Christophe JAILLET1-1/+1
An 'alloc_etherdev()' called is not ballanced by a corresponding 'free_netdev()' call in one error handling path. Slighly reorder the error handling code to catch the missed case. Fixes: c100e47caa8e ("mlxsw: minimal: Add ethtool support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27net: dsa: Fix use-after-free in probing of DSA switch treeVladimir Oltean1-11/+25
DSA sets up a switch tree little by little. Every switch of the N members of the tree calls dsa_register_switch, and (N - 1) will just touch the dst->ports list with their ports and quickly exit. Only the last switch that calls dsa_register_switch will find all DSA links complete in dsa_tree_setup_routing_table, and not return zero as a result but instead go ahead and set up the entire DSA switch tree (practically on behalf of the other switches too). The trouble is that the (N - 1) switches don't clean up after themselves after they get an error such as EPROBE_DEFER. Their footprint left in dst->ports by dsa_switch_touch_ports is still there. And switch N, the one responsible with actually setting up the tree, is going to work with those stale dp, dp->ds and dp->ds->dev pointers. In particular ds and ds->dev might get freed by the device driver. Be there a 2-switch tree and the following calling order: - Switch 1 calls dsa_register_switch - Calls dsa_switch_touch_ports, populates dst->ports - Calls dsa_port_parse_cpu, gets -EPROBE_DEFER, exits. - Switch 2 calls dsa_register_switch - Calls dsa_switch_touch_ports, populates dst->ports - Probe doesn't get deferred, so it goes ahead. - Calls dsa_tree_setup_routing_table, which returns "complete == true" due to Switch 1 having called dsa_switch_touch_ports before. - Because the DSA links are complete, it calls dsa_tree_setup_switches now. - dsa_tree_setup_switches iterates through dst->ports, initializing the Switch 1 ds structure (invalid) and the Switch 2 ds structure (valid). - Undefined behavior (use after free, sometimes NULL pointers, etc). Real example below (debugging prints added by me, as well as guards against NULL pointers): [ 5.477947] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.313002] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.319932] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.329693] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.339458] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.349226] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.358991] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.368758] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.378524] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.388291] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.398057] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.407912] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.417682] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.427446] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.437212] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.446979] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.456744] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.466512] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.476277] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.486043] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.495810] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.505577] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.515433] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.354120] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.361045] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.370805] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.380571] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.390337] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.400104] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.409872] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.419637] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.429403] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.439169] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803db15b80 (dev ffffff803d8e4800) The solution is to recognize that the functions that call dsa_switch_touch_ports (dsa_switch_parse_of, dsa_switch_parse) have side effects, and therefore one should clean up their side effects on error path. The cleanup of dst->ports was taken from dsa_switch_remove and moved into a dedicated dsa_switch_release_ports function, which should really be per-switch (free only the members of dst->ports that are also members of ds, instead of all switch ports). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27net: remove eth_change_mtuHeiner Kallweit2-17/+0
All usage of this function was removed three years ago, and the function was marked as deprecated: a52ad514fdf3 ("net: deprecate eth_change_mtu, remove usage") So I think we can remove it now. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27net: socionext: fix xdp_result initialization in netsec_process_rxLorenzo Bianconi1-1/+1
Fix xdp_result initialization in netsec_process_rx in order to not increase rx counters if there is no bpf program attached to the xdp hook and napi_gro_receive returns GRO_DROP Fixes: ba2b232108d3c ("net: netsec: add XDP support") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27net: socionext: fix possible user-after-free in netsec_process_rxLorenzo Bianconi1-1/+1
Fix possible use-after-free in in netsec_process_rx that can occurs if the first packet is sent to the normal networking stack and the following one is dropped by the bpf program attached to the xdp hook. Fix the issue defining the skb pointer in the 'budget' loop Fixes: ba2b232108d3c ("net: netsec: add XDP support") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27mlx5: Use dev_net netdevice notifier registrationsJiri Pirko8-10/+28
Register the dev_net notifier and allow the per-net notifier to follow the device into different namespace. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27net: introduce dev_net notifier register/unregister variantsJiri Pirko2-0/+63
Introduce dev_net variants of netdev notifier register/unregister functions and allow per-net notifier to follow the netdevice into the namespace it is moved to. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27net: push code from net notifier reg/unreg into helpersJiri Pirko1-22/+38
Push the code which is done under rtnl lock in net notifier register and unregister function into separate helpers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27net: call call_netdevice_unregister_net_notifiers from unregisterJiri Pirko1-11/+3
The function does the same thing as the existing code, so rather call call_netdevice_unregister_net_notifiers() instead of code duplication. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27soreuseport: Cleanup duplicate initialization of more_reuse->max_socks.Kuniyuki Iwashima1-1/+0
reuseport_grow() does not need to initialize the more_reuse->max_socks again. It is already initialized in __reuseport_alloc(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27udp: Support UDP fraglist GRO/GSO.Steffen Klassert3-26/+107
This patch extends UDP GRO to support fraglist GRO/GSO by using the previously introduced infrastructure. If the feature is enabled, all UDP packets are going to fraglist GRO (local input and forward). After validating the csum, we mark ip_summed as CHECKSUM_UNNECESSARY for fraglist GRO packets to make sure that the csum is not touched. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27net: Support GRO/GSO fraglist chaining.Steffen Klassert4-2/+97
This patch adds the core functions to chain/unchain GSO skbs at the frag_list pointer. This also adds a new GSO type SKB_GSO_FRAGLIST and a is_flist flag to napi_gro_cb which indicates that this flow will be GROed by fraglist chaining. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27net: Add a netdev software feature set that defaults to off.Steffen Klassert2-1/+4
The previous patch added the NETIF_F_GRO_FRAGLIST feature. This is a software feature that should default to off. Current software features default to on, so add a new feature set that defaults to off. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27net: Add fraglist GRO/GSO feature flagsSteffen Klassert4-1/+9
This adds new Fraglist GRO/GSO feature flags. They will be used to configure fraglist GRO/GSO what will be implemented with some followup paches. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27mvneta driver disallow XDP program on hardware buffer managementSven Auhagen1-0/+6
Recently XDP Support was added to the mvneta driver for software buffer management only. It is still possible to attach an XDP program if hardware buffer management is used. It is not doing anything at that point. The patch disallows attaching XDP programs to mvneta if hardware buffer management is used. I am sorry about that. It is my first submission and I am having some troubles with the format of my emails. v4 -> v5: - Remove extra tabs v3 -> v4: - Please ignore v3 I accidentally submitted my other patch with git-send-mail and v4 is correct v2 -> v3: - My mailserver corrupted the patch resubmission with git-send-email v1 -> v2: - Fixing the patches indentation Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27rxrpc: Fix use-after-free in rxrpc_receive_data()David Howells1-5/+7
The subpacket scanning loop in rxrpc_receive_data() references the subpacket count in the private data part of the sk_buff in the loop termination condition. However, when the final subpacket is pasted into the ring buffer, the function is no longer has a ref on the sk_buff and should not be looking at sp->* any more. This point is actually marked in the code when skb is cleared (but sp is not - which is an error). Fix this by caching sp->nr_subpackets in a local variable and using that instead. Also clear 'sp' to catch accesses after that point. This can show up as an oops in rxrpc_get_skb() if sp->nr_subpackets gets trashed by the sk_buff getting freed and reused in the meantime. Fixes: e2de6c404898 ("rxrpc: Use info in skbuff instead of reparsing a jumbo packet") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27net_sched: ematch: reject invalid TCF_EM_SIMPLEEric Dumazet1-0/+3
It is possible for malicious userspace to set TCF_EM_SIMPLE bit even for matches that should not have this bit set. This can fool two places using tcf_em_is_simple() 1) tcf_em_tree_destroy() -> memory leak of em->data if ops->destroy() is NULL 2) tcf_em_tree_dump() wrongly report/leak 4 low-order bytes of a kernel pointer. BUG: memory leak unreferenced object 0xffff888121850a40 (size 32): comm "syz-executor927", pid 7193, jiffies 4294941655 (age 19.840s) hex dump (first 32 bytes): 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f67036ea>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<00000000f67036ea>] slab_post_alloc_hook mm/slab.h:586 [inline] [<00000000f67036ea>] slab_alloc mm/slab.c:3320 [inline] [<00000000f67036ea>] __do_kmalloc mm/slab.c:3654 [inline] [<00000000f67036ea>] __kmalloc_track_caller+0x165/0x300 mm/slab.c:3671 [<00000000fab0cc8e>] kmemdup+0x27/0x60 mm/util.c:127 [<00000000d9992e0a>] kmemdup include/linux/string.h:453 [inline] [<00000000d9992e0a>] em_nbyte_change+0x5b/0x90 net/sched/em_nbyte.c:32 [<000000007e04f711>] tcf_em_validate net/sched/ematch.c:241 [inline] [<000000007e04f711>] tcf_em_tree_validate net/sched/ematch.c:359 [inline] [<000000007e04f711>] tcf_em_tree_validate+0x332/0x46f net/sched/ematch.c:300 [<000000007a769204>] basic_set_parms net/sched/cls_basic.c:157 [inline] [<000000007a769204>] basic_change+0x1d7/0x5f0 net/sched/cls_basic.c:219 [<00000000e57a5997>] tc_new_tfilter+0x566/0xf70 net/sched/cls_api.c:2104 [<0000000074b68559>] rtnetlink_rcv_msg+0x3b2/0x4b0 net/core/rtnetlink.c:5415 [<00000000b7fe53fb>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477 [<00000000e83a40d0>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442 [<00000000d62ba933>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] [<00000000d62ba933>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328 [<0000000088070f72>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917 [<00000000f70b15ea>] sock_sendmsg_nosec net/socket.c:639 [inline] [<00000000f70b15ea>] sock_sendmsg+0x54/0x70 net/socket.c:659 [<00000000ef95a9be>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330 [<00000000b650f1ab>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384 [<0000000055bfa74a>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417 [<000000002abac183>] __do_sys_sendmsg net/socket.c:2426 [inline] [<000000002abac183>] __se_sys_sendmsg net/socket.c:2424 [inline] [<000000002abac183>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+03c4738ed29d5d366ddf@syzkaller.appspotmail.com Cc: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27net: include struct nhmsg size in nh nlmsg sizeStephen Worley1-1/+3
Include the size of struct nhmsg size when calculating how much of a payload to allocate in a new netlink nexthop notification message. Without this, we will fail to fill the skbuff at certain nexthop group sizes. You can reproduce the failure with the following iproute2 commands: ip link add dummy1 type dummy ip link add dummy2 type dummy ip link add dummy3 type dummy ip link add dummy4 type dummy ip link add dummy5 type dummy ip link add dummy6 type dummy ip link add dummy7 type dummy ip link add dummy8 type dummy ip link add dummy9 type dummy ip link add dummy10 type dummy ip link add dummy11 type dummy ip link add dummy12 type dummy ip link add dummy13 type dummy ip link add dummy14 type dummy ip link add dummy15 type dummy ip link add dummy16 type dummy ip link add dummy17 type dummy ip link add dummy18 type dummy ip link add dummy19 type dummy ip ro add 1.1.1.1/32 dev dummy1 ip ro add 1.1.1.2/32 dev dummy2 ip ro add 1.1.1.3/32 dev dummy3 ip ro add 1.1.1.4/32 dev dummy4 ip ro add 1.1.1.5/32 dev dummy5 ip ro add 1.1.1.6/32 dev dummy6 ip ro add 1.1.1.7/32 dev dummy7 ip ro add 1.1.1.8/32 dev dummy8 ip ro add 1.1.1.9/32 dev dummy9 ip ro add 1.1.1.10/32 dev dummy10 ip ro add 1.1.1.11/32 dev dummy11 ip ro add 1.1.1.12/32 dev dummy12 ip ro add 1.1.1.13/32 dev dummy13 ip ro add 1.1.1.14/32 dev dummy14 ip ro add 1.1.1.15/32 dev dummy15 ip ro add 1.1.1.16/32 dev dummy16 ip ro add 1.1.1.17/32 dev dummy17 ip ro add 1.1.1.18/32 dev dummy18 ip ro add 1.1.1.19/32 dev dummy19 ip next add id 1 via 1.1.1.1 dev dummy1 ip next add id 2 via 1.1.1.2 dev dummy2 ip next add id 3 via 1.1.1.3 dev dummy3 ip next add id 4 via 1.1.1.4 dev dummy4 ip next add id 5 via 1.1.1.5 dev dummy5 ip next add id 6 via 1.1.1.6 dev dummy6 ip next add id 7 via 1.1.1.7 dev dummy7 ip next add id 8 via 1.1.1.8 dev dummy8 ip next add id 9 via 1.1.1.9 dev dummy9 ip next add id 10 via 1.1.1.10 dev dummy10 ip next add id 11 via 1.1.1.11 dev dummy11 ip next add id 12 via 1.1.1.12 dev dummy12 ip next add id 13 via 1.1.1.13 dev dummy13 ip next add id 14 via 1.1.1.14 dev dummy14 ip next add id 15 via 1.1.1.15 dev dummy15 ip next add id 16 via 1.1.1.16 dev dummy16 ip next add id 17 via 1.1.1.17 dev dummy17 ip next add id 18 via 1.1.1.18 dev dummy18 ip next add id 19 via 1.1.1.19 dev dummy19 ip next add id 1111 group 1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19 ip next del id 1111 Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27net_sched: walk through all child classes in tc_bind_tclass()Cong Wang1-11/+30
In a complex TC class hierarchy like this: tc qdisc add dev eth0 root handle 1:0 cbq bandwidth 100Mbit \ avpkt 1000 cell 8 tc class add dev eth0 parent 1:0 classid 1:1 cbq bandwidth 100Mbit \ rate 6Mbit weight 0.6Mbit prio 8 allot 1514 cell 8 maxburst 20 \ avpkt 1000 bounded tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \ sport 80 0xffff flowid 1:3 tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \ sport 25 0xffff flowid 1:4 tc class add dev eth0 parent 1:1 classid 1:3 cbq bandwidth 100Mbit \ rate 5Mbit weight 0.5Mbit prio 5 allot 1514 cell 8 maxburst 20 \ avpkt 1000 tc class add dev eth0 parent 1:1 classid 1:4 cbq bandwidth 100Mbit \ rate 3Mbit weight 0.3Mbit prio 5 allot 1514 cell 8 maxburst 20 \ avpkt 1000 where filters are installed on qdisc 1:0, so we can't merely search from class 1:1 when creating class 1:3 and class 1:4. We have to walk through all the child classes of the direct parent qdisc. Otherwise we would miss filters those need reverse binding. Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27net_sched: fix ops->bind_class() implementationsCong Wang12-44/+97
The current implementations of ops->bind_class() are merely searching for classid and updating class in the struct tcf_result, without invoking either of cl_ops->bind_tcf() or cl_ops->unbind_tcf(). This breaks the design of them as qdisc's like cbq use them to count filters too. This is why syzbot triggered the warning in cbq_destroy_class(). In order to fix this, we have to call cl_ops->bind_tcf() and cl_ops->unbind_tcf() like the filter binding path. This patch does so by refactoring out two helper functions __tcf_bind_filter() and __tcf_unbind_filter(), which are lockless and accept a Qdisc pointer, then teaching each implementation to call them correctly. Note, we merely pass the Qdisc pointer as an opaque pointer to each filter, they only need to pass it down to the helper functions without understanding it at all. Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class") Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27selftests: netfilter: Introduce tests for sets with range concatenationStefano Brivio2-1/+1483
This test covers functionality and stability of the newly added nftables set implementation supporting concatenation of ranged fields. For some selected set expression types, test: - correctness, by checking that packets match or don't - concurrency, by attempting races between insertion, deletion, lookup - timeout feature, checking that packets don't match expired entries and (roughly) estimate matching rates, comparing to baselines for simple drop on netdev ingress hook and for hash and rbtrees sets. In order to send packets, this needs one of sendip, netcat or bash. To flood with traffic, iperf3, iperf and netperf are supported. For performance measurements, this relies on the sample pktgen script pktgen_bench_xmit_mode_netif_receive.sh. If none of the tools suitable for a given test are available, specific tests will be skipped. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-27nf_tables: Add set type for arbitrary concatenation of rangesStefano Brivio4-1/+2107
This new set type allows for intervals in concatenated fields, which are expressed in the usual way, that is, simple byte concatenation with padding to 32 bits for single fields, and given as ranges by specifying start and end elements containing, each, the full concatenation of start and end values for the single fields. Ranges are expanded to composing netmasks, for each field: these are inserted as rules in per-field lookup tables. Bits to be classified are divided in 4-bit groups, and for each group, the lookup table contains 4^2 buckets, representing all the possible values of a bit group. This approach was inspired by the Grouper algorithm: http://www.cse.usf.edu/~ligatti/projects/grouper/ Matching is performed by a sequence of AND operations between bucket values, with buckets selected according to the value of packet bits, for each group. The result of this sequence tells us which rules matched for a given field. In order to concatenate several ranged fields, per-field rules are mapped using mapping arrays, one per field, that specify which rules should be considered while matching the next field. The mapping array for the last field contains a reference to the element originally inserted. The notes in nft_set_pipapo.c cover the algorithm in deeper detail. A pure hash-based approach is of no use here, as ranges need to be classified. An implementation based on "proxying" the existing red-black tree set type, creating a tree for each field, was considered, but deemed impractical due to the fact that elements would need to be shared between trees, at least as long as we want to keep UAPI changes to a minimum. A stand-alone implementation of this algorithm is available at: https://pipapo.lameexcu.se together with notes about possible future optimisations (in pipapo.c). This algorithm was designed with data locality in mind, and can be highly optimised for SIMD instruction sets, as the bulk of the matching work is done with repetitive, simple bitwise operations. At this point, without further optimisations, nft_concat_range.sh reports, for one AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB L2$): TEST: performance net,port [ OK ] baseline (drop from netdev hook): 10190076pps baseline hash (non-ranged entries): 6179564pps baseline rbtree (match on first field only): 2950341pps set with 1000 full, ranged entries: 2304165pps port,net [ OK ] baseline (drop from netdev hook): 10143615pps baseline hash (non-ranged entries): 6135776pps baseline rbtree (match on first field only): 4311934pps set with 100 full, ranged entries: 4131471pps net6,port [ OK ] baseline (drop from netdev hook): 9730404pps baseline hash (non-ranged entries): 4809557pps baseline rbtree (match on first field only): 1501699pps set with 1000 full, ranged entries: 1092557pps port,proto [ OK ] baseline (drop from netdev hook): 10812426pps baseline hash (non-ranged entries): 6929353pps baseline rbtree (match on first field only): 3027105pps set with 30000 full, ranged entries: 284147pps net6,port,mac [ OK ] baseline (drop from netdev hook): 9660114pps baseline hash (non-ranged entries): 3778877pps baseline rbtree (match on first field only): 3179379pps set with 10 full, ranged entries: 2082880pps net6,port,mac,proto [ OK ] baseline (drop from netdev hook): 9718324pps baseline hash (non-ranged entries): 3799021pps baseline rbtree (match on first field only): 1506689pps set with 1000 full, ranged entries: 783810pps net,mac [ OK ] baseline (drop from netdev hook): 10190029pps baseline hash (non-ranged entries): 5172218pps baseline rbtree (match on first field only): 2946863pps set with 1000 full, ranged entries: 1279122pps v4: - fix build for 32-bit architectures: 64-bit division needs div_u64() (kbuild test robot <lkp@intel.com>) v3: - rework interface for field length specification, NFT_SET_SUBKEY disappears and information is stored in description - remove scratch area to store closing element of ranges, as elements now come with an actual attribute to specify the upper range limit (Pablo Neira Ayuso) - also remove pointer to 'start' element from mapping table, closing key is now accessible via extension data - use bytes right away instead of bits for field lengths, this way we can also double the inner loop of the lookup function to take care of upper and lower bits in a single iteration (minor performance improvement) - make it clearer that set operations are actually atomic API-wise, but we can't e.g. implement flush() as one-shot action - fix type for 'dup' in nft_pipapo_insert(), check for duplicates only in the next generation, and in general take care of differentiating generation mask cases depending on the operation (Pablo Neira Ayuso) - report C implementation matching rate in commit message, so that AVX2 implementation can be compared (Pablo Neira Ayuso) v2: - protect access to scratch maps in nft_pipapo_lookup() with local_bh_disable/enable() (Florian Westphal) - drop rcu_read_lock/unlock() from nft_pipapo_lookup(), it's already implied (Florian Westphal) - explain why partial allocation failures don't need handling in pipapo_realloc_scratch(), rename 'm' to clone and update related kerneldoc to make it clear we're not operating on the live copy (Florian Westphal) - add expicit check for priv->start_elem in nft_pipapo_insert() to avoid ending up in nft_pipapo_walk() with a NULL start element, and also zero it out in every operation that might make it invalid, so that insertion doesn't proceed with an invalid element (Florian Westphal) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-27bitmap: Introduce bitmap_cut(): cut bits and shift remainingStefano Brivio2-0/+70
The new bitmap function bitmap_cut() copies bits from source to destination by removing the region specified by parameters first and cut, and remapping the bits above the cut region by right shifting them. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-27netfilter: nf_tables: Support for sets with multiple ranged fieldsStefano Brivio4-1/+115
Introduce a new nested netlink attribute, NFTA_SET_DESC_CONCAT, used to specify the length of each field in a set concatenation. This allows set implementations to support concatenation of multiple ranged items, as they can divide the input key into matching data for every single field. Such set implementations would be selected as they specify support for NFT_SET_INTERVAL and allow desc->field_count to be greater than one. Explicitly disallow this for nft_set_rbtree. In order to specify the interval for a set entry, userspace would include in NFTA_SET_DESC_CONCAT attributes field lengths, and pass range endpoints as two separate keys, represented by attributes NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_KEY_END. While at it, export the number of 32-bit registers available for packet matching, as nftables will need this to know the maximum number of field lengths that can be specified. For example, "packets with an IPv4 address between 192.0.2.0 and 192.0.2.42, with destination port between 22 and 25", can be expressed as two concatenated elements: NFTA_SET_ELEM_KEY: 192.0.2.0 . 22 NFTA_SET_ELEM_KEY_END: 192.0.2.42 . 25 and NFTA_SET_DESC_CONCAT attribute would contain: NFTA_LIST_ELEM NFTA_SET_FIELD_LEN: 4 NFTA_LIST_ELEM NFTA_SET_FIELD_LEN: 2 v4: No changes v3: Complete rework, NFTA_SET_DESC_CONCAT instead of NFTA_SET_SUBKEY v2: No changes Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-27netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attributePablo Neira Ayuso4-24/+79
Add NFTA_SET_ELEM_KEY_END attribute to convey the closing element of the interval between kernel and userspace. This patch also adds the NFT_SET_EXT_KEY_END extension to store the closing element value in this interval. v4: No changes v3: New patch [sbrivio: refactor error paths and labels; add corresponding nft_set_ext_type for new key; rebase] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-27netfilter: nf_tables: add nft_setelem_parse_key()Pablo Neira Ayuso1-46/+45
Add helper function to parse the set element key netlink attribute. v4: No changes v3: New patch [sbrivio: refactor error paths and labels; use NFT_DATA_VALUE_MAXLEN instead of sizeof(*key) in helper, value can be longer than that; rebase] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-26iwlegacy: ensure loop counter addr does not wrap and cause an infinite loopColin Ian King1-1/+1
The loop counter addr is a u16 where as the upper limit of the loop is an int. In the unlikely event that the il->cfg->eeprom_size is greater than 64K then we end up with an infinite loop since addr will wrap around an never reach upper loop limit. Fix this by making addr an int. Addresses-Coverity: ("Infinite loop") Fixes: be663ab67077 ("iwlwifi: split the drivers for agn and legacy devices 3945/4965") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Stanislaw Gruszka <stf_xl@wp.pl> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26rtlwifi: btcoex: fix spelling mistake "initilized" -> "initialized"Colin Ian King3-3/+3
There is a spelling mistake in one of the fields in the btc_coexist struct, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26rtlwifi: rtl8723ae: remove unused variablesYueHaibing1-112/+0
drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c:16:18: warning: ofdmswing_table defined but not used [-Wunused-const-variable=] drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c:56:17: warning: cckswing_table_ch1ch13 defined but not used [-Wunused-const-variable=] drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c:92:17: warning: cckswing_table_ch14 defined but not used [-Wunused-const-variable=] These variable is never used, so remove them. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26rtlwifi: rtl8192ee: remove unused variablesYueHaibing1-118/+0
drivers/net/wireless/realtek/rtlwifi/rtl8192ee/dm.c:15:18: warning: ofdmswing_table defined but not used [-Wunused-const-variable=] drivers/net/wireless/realtek/rtlwifi/rtl8192ee/dm.c:61:17: warning: cckswing_table_ch1ch13 defined but not used [-Wunused-const-variable=] drivers/net/wireless/realtek/rtlwifi/rtl8192ee/dm.c:97:17: warning: cckswing_table_ch14 defined but not used [-Wunused-const-variable=] These variable is never used, so remove them. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26rtlwifi: rtl8821ae: remove unused variablesYueHaibing1-118/+0
drivers/net/wireless/realtek/rtlwifi/rtl8821ae/dm.c:142:17: warning: cckswing_table_ch1ch13 defined but not used [-Wunused-const-variable=] drivers/net/wireless/realtek/rtlwifi/rtl8821ae/dm.c:178:17: warning: cckswing_table_ch14 defined but not used [-Wunused-const-variable=] drivers/net/wireless/realtek/rtlwifi/rtl8821ae/dm.c:96:18: warning: ofdmswing_table defined but not used [-Wunused-const-variable=] These variable is never used, so remove them. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26rtlwifi: rtl8188ee: remove redundant assignment to variable condColin Ian King1-1/+1
Variable cond is being assigned with a value that is never read, it is assigned a new value later on. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26qtnfmac: add support for TWT responder and spatial reuseMikhail Karpenko2-1/+53
Add support for 11ax features: TWT responder and spatial reuse. Add separate structure for spatial reuse parameters and pass this structure to firmware along with other parameters in start_ap command. Pass TWT responder value to firmware. Bump qlink protocol version. Signed-off-by: Mikhail Karpenko <mkarpenko@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26qtnfmac: add support for STA HE ratesSergey Matyukevich2-0/+3
Add HE rates into STA info. Report HE Rx/Tx MCS if STA supports them. Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26qtnfmac: control qtnfmac wireless interfaces bridgingSergey Matyukevich1-18/+34
Bridging qtnfmac interfaces is possible only if the following two conditions are fulfilled: - firmware advertises proper support with QLINK_HW_CAPAB_HW_BRIDGE - kernel is built with CONFIG_NET_SWITCHDEV support Otherwise adding qtnfmac wireless interfaces into the same bridge should not be allowed since packets flooded by kernel may break internal forwarding rules between interfaces. This patch disables adding qtnfmac wireless interfaces into the same bridge if no support is provided either by card or by kernel. Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26qtnfmac: add module param to configure DFS offloadSergey Matyukevich6-7/+22
Firmware may support DFS offload. However the final decision on whether to use it or not should be up to the user. So even if firmware supports DFS offload, it should be enabled only if user explicitly requests it. For this purpose introduce kernel param dfs_offload which is disabled by default. Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26qtnfmac: cleanup slave_radar access functionSergey Matyukevich3-7/+7
Currently this parameter is global, it is not specific to mac. So this function does not need any input parameters. Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26brcmfmac: Remove always false 'idx < 0' statementyuehaibing1-1/+1
idx is declared as u32, it will never less than 0. Signed-off-by: yuehaibing <yuehaibing@huawei.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26rtw88: use shorter delay time to poll PS stateYan-Hsuan Chuang1-2/+2
When TX packet arrives, driver should leave deep PS state to make sure the DMA is working. After requested to leave deep PS state, driver needs to poll the PS state to check if the mode has been changed successfully. The driver used to check the state of the hardware every 20 msecs, which means upon the first failure of state check, the CPU is delayed 20 msecs for next check. This is harmful for some time-sensitive applications such as media players. So, use shorter delay time each check from 20 msecs to 100 usecs. The state should be changed in several tries. But we still need to reserve ~15 msecs in total in case of the state just took too long to be changed successfully. If the states of driver and the hardware is not synchronized, the power state could be locked forever, which mean we could never enter/leave the PS state. Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com> Reviewed-by: Chris Chiu <chiu@endlessm.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26rtw88: fix potential NULL skb access in TX ISRYan-Hsuan Chuang1-0/+5
Sometimes the TX queue may be empty and we could possible dequeue a NULL pointer, crash the kernel. If the skb is NULL then there is nothing to do, just leave the ISR. And the TX queue should not be empty here, so print an error to see if there is anything wrong for DMA ring. Fixes: e3037485c68e ("rtw88: new Realtek 802.11ac driver") Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26brcmfmac: add initial support for monitor modeRafał Miłecki6-13/+174
Report monitor interface availability using cfg80211 and support it in the add_virtual_intf() and del_virtual_intf() callbacks. This new feature is conditional and depends on firmware flagging monitor packets. Receiving monitor frames is already handled by the brcmf_netif_mon_rx(). Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26brcmfmac: simplify building interface combinationsRafał Miłecki1-29/+14
Move similar/duplicated code out of combination specific code blocks. This simplifies code a bit and allows adding more combinations later. A list of combinations remains unchanged. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26brcmfmac: sdio: Fix OOB interrupt initialization on brcm43362Jean-Philippe Brucker1-6/+6
Commit 262f2b53f679 ("brcmfmac: call brcmf_attach() just before calling brcmf_bus_started()") changed the initialization order of the brcmfmac SDIO driver. Unfortunately since brcmf_sdiod_intr_register() is now called before the sdiodev->bus_if initialization, it reads the wrong chip ID and fails to initialize the GPIO on brcm43362. Thus the chip cannot send interrupts and fails to probe: [ 12.517023] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [ 12.531214] ieee80211 phy0: brcmf_bus_started: failed: -110 [ 12.536976] ieee80211 phy0: brcmf_attach: dongle is not responding: err=-110 [ 12.566467] brcmfmac: brcmf_sdio_firmware_callback: brcmf_attach failed Initialize the bus interface earlier to ensure that brcmf_sdiod_intr_register() properly sets up the OOB interrupt. BugLink: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=908438 Fixes: 262f2b53f679 ("brcmfmac: call brcmf_attach() just before calling brcmf_bus_started()") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26brcmfmac: use true,false for bool variablezhengbin1-1/+1
Fixes coccicheck warning: drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c:911:2-24: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26cw1200: use true,false for bool variablezhengbin1-1/+1
Fixes coccicheck warning: drivers/net/wireless/st/cw1200/txrx.c:718:6-16: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26rtw88: use true,false for bool variablezhengbin1-1/+1
Fixes coccicheck warning: drivers/net/wireless/realtek/rtw88/phy.c:1437:1-24: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26rtlwifi: rtl8821ae: Make functions static & rm sw.hAmadeusz Sławiński2-16/+3
Some of functions which were exposed in sw.h, are only used in sw.c, so just make them static. This makes sw.h unnecessary, so remove it. Signed-off-by: Amadeusz Sławiński <amade@asmblr.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26rtlwifi: rtl8723be: Make functions static & rm sw.hAmadeusz Sławiński2-17/+3
Some of functions which were exposed in sw.h, are only used in sw.c, so just make them static. This makes sw.h unnecessary, so remove it. Signed-off-by: Amadeusz Sławiński <amade@asmblr.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2020-01-26rtlwifi: rtl8723ae: Make functions static & rm sw.hAmadeusz Sławiński2-17/+3
Some of functions which were exposed in sw.h, are only used in sw.c, so just make them static. This makes sw.h unnecessary, so remove it. Signed-off-by: Amadeusz Sławiński <amade@asmblr.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>