summaryrefslogtreecommitdiffstats
path: root/drivers/net/geneve.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* netdev_features: convert NETIF_F_LLTX to dev->lltxAlexander Lobakin2024-09-031-1/+1
| | | | | | | | | | NETIF_F_LLTX can't be changed via Ethtool and is not a feature, rather an attribute, very similar to IFF_NO_QUEUE (and hot). Free one netdev_features_t bit and make it a "hot" private flag. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* geneve: Fix incorrect inner network header offset when innerprotoinherit is setGal Pressman2024-06-101-4/+6
| | | | | | | | | | | | | | | | | When innerprotoinherit is set, the tunneled packets do not have an inner Ethernet header. Change 'maclen' to not always assume the header length is ETH_HLEN, as there might not be a MAC header. This resolves issues with drivers (e.g. mlx5, in mlx5e_tx_tunnel_accel()) who rely on the skb inner network header offset to be correct, and use it for TX offloads. Fixes: d8a6213d70ac ("geneve: fix header validation in geneve[6]_xmit_skb") Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: annotate writes on dev->mtu from ndo_change_mtu()Eric Dumazet2024-05-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Simon reported that ndo_change_mtu() methods were never updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted in commit 501a90c94510 ("inet: protect against too small mtu values.") We read dev->mtu without holding RTNL in many places, with READ_ONCE() annotations. It is time to take care of ndo_change_mtu() methods to use corresponding WRITE_ONCE() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2024-04-111-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. Conflicts: net/unix/garbage.c 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()") 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.c faa12ca24558 ("bnxt_en: Reset PTP tx_avail after possible firmware reset") b3d0083caf9a ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()") drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c 7ac10c7d728d ("bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()") 194fad5b2781 ("bnxt_en: Refactor bnxt_rdma_aux_device_init/uninit functions") drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 958f56e48385 ("net/mlx5e: Un-expose functions in en.h") 49e6c9387051 ("net/mlx5e: RSS, Block XOR hash with over 128 channels") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * geneve: fix header validation in geneve[6]_xmit_skbEric Dumazet2024-04-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot is able to trigger an uninit-value in geneve_xmit() [1] Problem : While most ip tunnel helpers (like ip_tunnel_get_dsfield()) uses skb_protocol(skb, true), pskb_inet_may_pull() is only using skb->protocol. If anything else than ETH_P_IPV6 or ETH_P_IP is found in skb->protocol, pskb_inet_may_pull() does nothing at all. If a vlan tag was provided by the caller (af_packet in the syzbot case), the network header might not point to the correct location, and skb linear part could be smaller than expected. Add skb_vlan_inet_prepare() to perform a complete mac validation. Use this in geneve for the moment, I suspect we need to adopt this more broadly. v4 - Jakub reported v3 broke l2_tos_ttl_inherit.sh selftest - Only call __vlan_get_protocol() for vlan types. Link: https://lore.kernel.org/netdev/20240404100035.3270a7d5@kernel.org/ v2,v3 - Addressed Sabrina comments on v1 and v2 Link: https://lore.kernel.org/netdev/Zg1l9L2BNoZWZDZG@hog/ [1] BUG: KMSAN: uninit-value in geneve_xmit_skb drivers/net/geneve.c:910 [inline] BUG: KMSAN: uninit-value in geneve_xmit+0x302d/0x5420 drivers/net/geneve.c:1030 geneve_xmit_skb drivers/net/geneve.c:910 [inline] geneve_xmit+0x302d/0x5420 drivers/net/geneve.c:1030 __netdev_start_xmit include/linux/netdevice.h:4903 [inline] netdev_start_xmit include/linux/netdevice.h:4917 [inline] xmit_one net/core/dev.c:3531 [inline] dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3547 __dev_queue_xmit+0x348d/0x52c0 net/core/dev.c:4335 dev_queue_xmit include/linux/netdevice.h:3091 [inline] packet_xmit+0x9c/0x6c0 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3081 [inline] packet_sendmsg+0x8bb0/0x9ef0 net/packet/af_packet.c:3113 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 __sys_sendto+0x685/0x830 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x125/0x1d0 net/socket.c:2199 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 Uninit was created at: slab_post_alloc_hook mm/slub.c:3804 [inline] slab_alloc_node mm/slub.c:3845 [inline] kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577 __alloc_skb+0x35b/0x7a0 net/core/skbuff.c:668 alloc_skb include/linux/skbuff.h:1318 [inline] alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6504 sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2795 packet_alloc_skb net/packet/af_packet.c:2930 [inline] packet_snd net/packet/af_packet.c:3024 [inline] packet_sendmsg+0x722d/0x9ef0 net/packet/af_packet.c:3113 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 __sys_sendto+0x685/0x830 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x125/0x1d0 net/socket.c:2199 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 CPU: 0 PID: 5033 Comm: syz-executor346 Not tainted 6.9.0-rc1-syzkaller-00005-g928a87efa423 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024 Fixes: d13f048dd40e ("net: geneve: modify IP header check in geneve6_xmit_skb and geneve_xmit_skb") Reported-by: syzbot+9ee20ec1de7b3168db09@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/000000000000d19c3a06152f9ee4@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Phillip Potter <phil@philpotter.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ip_tunnel: convert __be16 tunnel flags to bitmapsAlexander Lobakin2024-04-011-17/+27
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT have been defined as __be16. Now all of those 16 bits are occupied and there's no more free space for new flags. It can't be simply switched to a bigger container with no adjustments to the values, since it's an explicit Endian storage, and on LE systems (__be16)0x0001 equals to (__be64)0x0001000000000000. We could probably define new 64-bit flags depending on the Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on LE, but that would introduce an Endianness dependency and spawn a ton of Sparse warnings. To mitigate them, all of those places which were adjusted with this change would be touched anyway, so why not define stuff properly if there's no choice. Define IP_TUNNEL_*_BIT counterparts as a bit number instead of the value already coded and a fistful of <16 <-> bitmap> converters and helpers. The two flags which have a different bit position are SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as __cpu_to_be16(), but as (__force __be16), i.e. had different positions on LE and BE. Now they both have strongly defined places. Change all __be16 fields which were used to store those flags, to IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) -> unsigned long[1] for now, and replace all TUNNEL_* occurrences to their bitmap counterparts. Use the converters in the places which talk to the userspace, hardware (NFP) or other hosts (GRE header). The rest must explicitly use the new flags only. This must be done at once, otherwise there will be too many conversions throughout the code in the intermediate commits. Finally, disable the old __be16 flags for use in the kernel code (except for the two 'irregular' flags mentioned above), to prevent any accidental (mis)use of them. For the userspace, nothing is changed, only additions were made. Most noticeable bloat-o-meter difference (.text): vmlinux: 307/-1 (306) gre.ko: 62/0 (62) ip_gre.ko: 941/-217 (724) [*] ip_tunnel.ko: 390/-900 (-510) [**] ip_vti.ko: 138/0 (138) ip6_gre.ko: 534/-18 (516) [*] ip6_tunnel.ko: 118/-10 (108) [*] gre_flags_to_tnl_flags() grew, but still is inlined [**] ip_tunnel_find() got uninlined, hence such decrease The average code size increase in non-extreme case is 100-200 bytes per module, mostly due to sizeof(long) > sizeof(__be16), as %__IP_TUNNEL_FLAG_NUM is less than %BITS_PER_LONG and the compilers are able to expand the majority of bitmap_*() calls here into direct operations on scalars. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2024-03-071-2/+16
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/page_pool_user.c 0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors") 429679dcf7d9 ("page_pool: fix netlink dump stop/resume") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * geneve: make sure to pull inner header in geneve_rx()Eric Dumazet2024-03-041-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot triggered a bug in geneve_rx() [1] Issue is similar to the one I fixed in commit 8d975c15c0cd ("ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()") We have to save skb->network_header in a temporary variable in order to be able to recompute the network_header pointer after a pskb_inet_may_pull() call. pskb_inet_may_pull() makes sure the needed headers are in skb->head. [1] BUG: KMSAN: uninit-value in IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline] BUG: KMSAN: uninit-value in geneve_rx drivers/net/geneve.c:279 [inline] BUG: KMSAN: uninit-value in geneve_udp_encap_recv+0x36f9/0x3c10 drivers/net/geneve.c:391 IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline] geneve_rx drivers/net/geneve.c:279 [inline] geneve_udp_encap_recv+0x36f9/0x3c10 drivers/net/geneve.c:391 udp_queue_rcv_one_skb+0x1d39/0x1f20 net/ipv4/udp.c:2108 udp_queue_rcv_skb+0x6ae/0x6e0 net/ipv4/udp.c:2186 udp_unicast_rcv_skb+0x184/0x4b0 net/ipv4/udp.c:2346 __udp4_lib_rcv+0x1c6b/0x3010 net/ipv4/udp.c:2422 udp_rcv+0x7d/0xa0 net/ipv4/udp.c:2604 ip_protocol_deliver_rcu+0x264/0x1300 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x2b8/0x440 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:461 [inline] ip_rcv_finish net/ipv4/ip_input.c:449 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] ip_rcv+0x46f/0x760 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core net/core/dev.c:5534 [inline] __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5648 process_backlog+0x480/0x8b0 net/core/dev.c:5976 __napi_poll+0xe3/0x980 net/core/dev.c:6576 napi_poll net/core/dev.c:6645 [inline] net_rx_action+0x8b8/0x1870 net/core/dev.c:6778 __do_softirq+0x1b7/0x7c5 kernel/softirq.c:553 do_softirq+0x9a/0xf0 kernel/softirq.c:454 __local_bh_enable_ip+0x9b/0xa0 kernel/softirq.c:381 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:820 [inline] __dev_queue_xmit+0x2768/0x51c0 net/core/dev.c:4378 dev_queue_xmit include/linux/netdevice.h:3171 [inline] packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3081 [inline] packet_sendmsg+0x8aef/0x9f10 net/packet/af_packet.c:3113 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] __sys_sendto+0x735/0xa10 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x125/0x1c0 net/socket.c:2199 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Uninit was created at: slab_post_alloc_hook mm/slub.c:3819 [inline] slab_alloc_node mm/slub.c:3860 [inline] kmem_cache_alloc_node+0x5cb/0xbc0 mm/slub.c:3903 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:560 __alloc_skb+0x352/0x790 net/core/skbuff.c:651 alloc_skb include/linux/skbuff.h:1296 [inline] alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6394 sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2783 packet_alloc_skb net/packet/af_packet.c:2930 [inline] packet_snd net/packet/af_packet.c:3024 [inline] packet_sendmsg+0x70c2/0x9f10 net/packet/af_packet.c:3113 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] __sys_sendto+0x735/0xa10 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x125/0x1c0 net/socket.c:2199 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels") Reported-and-tested-by: syzbot+6a1423ff3f97159aae64@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: geneve: Remove generic .ndo_get_stats64Breno Leitao2024-03-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3e2f544dd8a33 ("net: get stats64 if device if driver is configured") moved the callback to dev_get_tstats64() to net core, so, unless the driver is doing some custom stats collection, it does not need to set .ndo_get_stats64. Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64 function pointer. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240305172911.502058-2-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: geneve: Leverage core stats allocatorBreno Leitao2024-03-071-9/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of in this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Remove the allocation in the geneve driver and leverage the network core allocation instead. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240305172911.502058-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: gro: rename skb_gro_header_hard()Eric Dumazet2024-03-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skb_gro_header_hard() is renamed to skb_gro_may_pull() to match the convention used by common helpers like pskb_may_pull(). This means the condition is inverted: if (skb_gro_header_hard(skb, hlen)) slow_path(); becomes: if (!skb_gro_may_pull(skb, hlen)) slow_path(); Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* | net: geneve: constify the struct device_type usageRicardo B. Marliere2024-02-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Since commit aed65af1cc2f ("drivers: make device_type const"), the driver core can properly handle constant struct device_type. Move the geneve_type variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: add netdev_lockdep_set_classes() to virtual driversEric Dumazet2024-02-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on a syzbot report, it appears many virtual drivers do not yet use netdev_lockdep_set_classes(), triggerring lockdep false positives. WARNING: possible recursive locking detected 6.8.0-rc4-next-20240212-syzkaller #0 Not tainted syz-executor.0/19016 is trying to acquire lock: ffff8880162cb298 (_xmit_ETHER#2){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff8880162cb298 (_xmit_ETHER#2){+.-.}-{2:2}, at: __netif_tx_lock include/linux/netdevice.h:4452 [inline] ffff8880162cb298 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x1c4/0x5f0 net/sched/sch_generic.c:340 but task is already holding lock: ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: __netif_tx_lock include/linux/netdevice.h:4452 [inline] ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x1c4/0x5f0 net/sched/sch_generic.c:340 other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(_xmit_ETHER#2); lock(_xmit_ETHER#2); *** DEADLOCK *** May be due to missing lock nesting notation 9 locks held by syz-executor.0/19016: #0: ffffffff8f385208 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline] #0: ffffffff8f385208 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x82c/0x1040 net/core/rtnetlink.c:6603 #1: ffffc90000a08c00 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0xc0/0x600 kernel/time/timer.c:1697 #2: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline] #2: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline] #2: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1360 net/ipv4/ip_output.c:228 #3: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: local_bh_disable include/linux/bottom_half.h:20 [inline] #3: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: rcu_read_lock_bh include/linux/rcupdate.h:802 [inline] #3: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x2c4/0x3b10 net/core/dev.c:4284 #4: ffff8880416e3258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: spin_trylock include/linux/spinlock.h:361 [inline] #4: ffff8880416e3258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: qdisc_run_begin include/net/sch_generic.h:195 [inline] #4: ffff8880416e3258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_xmit_skb net/core/dev.c:3771 [inline] #4: ffff8880416e3258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x1262/0x3b10 net/core/dev.c:4325 #5: ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #5: ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: __netif_tx_lock include/linux/netdevice.h:4452 [inline] #5: ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x1c4/0x5f0 net/sched/sch_generic.c:340 #6: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline] #6: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline] #6: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1360 net/ipv4/ip_output.c:228 #7: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: local_bh_disable include/linux/bottom_half.h:20 [inline] #7: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: rcu_read_lock_bh include/linux/rcupdate.h:802 [inline] #7: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x2c4/0x3b10 net/core/dev.c:4284 #8: ffff888014d9d258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: spin_trylock include/linux/spinlock.h:361 [inline] #8: ffff888014d9d258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: qdisc_run_begin include/net/sch_generic.h:195 [inline] #8: ffff888014d9d258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_xmit_skb net/core/dev.c:3771 [inline] #8: ffff888014d9d258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x1262/0x3b10 net/core/dev.c:4325 stack backtrace: CPU: 1 PID: 19016 Comm: syz-executor.0 Not tainted 6.8.0-rc4-next-20240212-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 check_deadlock kernel/locking/lockdep.c:3062 [inline] validate_chain+0x15c1/0x58e0 kernel/locking/lockdep.c:3856 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __netif_tx_lock include/linux/netdevice.h:4452 [inline] sch_direct_xmit+0x1c4/0x5f0 net/sched/sch_generic.c:340 __dev_xmit_skb net/core/dev.c:3784 [inline] __dev_queue_xmit+0x1912/0x3b10 net/core/dev.c:4325 neigh_output include/net/neighbour.h:542 [inline] ip_finish_output2+0xe66/0x1360 net/ipv4/ip_output.c:235 iptunnel_xmit+0x540/0x9b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x20ee/0x2960 net/ipv4/ip_tunnel.c:831 erspan_xmit+0x9de/0x1460 net/ipv4/ip_gre.c:720 __netdev_start_xmit include/linux/netdevice.h:4989 [inline] netdev_start_xmit include/linux/netdevice.h:5003 [inline] xmit_one net/core/dev.c:3555 [inline] dev_hard_start_xmit+0x242/0x770 net/core/dev.c:3571 sch_direct_xmit+0x2b6/0x5f0 net/sched/sch_generic.c:342 __dev_xmit_skb net/core/dev.c:3784 [inline] __dev_queue_xmit+0x1912/0x3b10 net/core/dev.c:4325 neigh_output include/net/neighbour.h:542 [inline] ip_finish_output2+0xe66/0x1360 net/ipv4/ip_output.c:235 igmpv3_send_cr net/ipv4/igmp.c:723 [inline] igmp_ifc_timer_expire+0xb71/0xd90 net/ipv4/igmp.c:813 call_timer_fn+0x17e/0x600 kernel/time/timer.c:1700 expire_timers kernel/time/timer.c:1751 [inline] __run_timers+0x621/0x830 kernel/time/timer.c:2038 run_timer_softirq+0x67/0xf0 kernel/time/timer.c:2051 __do_softirq+0x2bc/0x943 kernel/softirq.c:554 invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633 irq_exit_rcu+0x9/0x30 kernel/softirq.c:645 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1076 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1076 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 RIP: 0010:resched_offsets_ok kernel/sched/core.c:10127 [inline] RIP: 0010:__might_resched+0x16f/0x780 kernel/sched/core.c:10142 Code: 00 4c 89 e8 48 c1 e8 03 48 ba 00 00 00 00 00 fc ff df 48 89 44 24 38 0f b6 04 10 84 c0 0f 85 87 04 00 00 41 8b 45 00 c1 e0 08 <01> d8 44 39 e0 0f 85 d6 00 00 00 44 89 64 24 1c 48 8d bc 24 a0 00 RSP: 0018:ffffc9000ee069e0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8880296a9e00 RDX: dffffc0000000000 RSI: ffff8880296a9e00 RDI: ffffffff8bfe8fa0 RBP: ffffc9000ee06b00 R08: ffffffff82326877 R09: 1ffff11002b5ad1b R10: dffffc0000000000 R11: ffffed1002b5ad1c R12: 0000000000000000 R13: ffff8880296aa23c R14: 000000000000062a R15: 1ffff92001dc0d44 down_write+0x19/0x50 kernel/locking/rwsem.c:1578 kernfs_activate fs/kernfs/dir.c:1403 [inline] kernfs_add_one+0x4af/0x8b0 fs/kernfs/dir.c:819 __kernfs_create_file+0x22e/0x2e0 fs/kernfs/file.c:1056 sysfs_add_file_mode_ns+0x24a/0x310 fs/sysfs/file.c:307 create_files fs/sysfs/group.c:64 [inline] internal_create_group+0x4f4/0xf20 fs/sysfs/group.c:152 internal_create_groups fs/sysfs/group.c:192 [inline] sysfs_create_groups+0x56/0x120 fs/sysfs/group.c:218 create_dir lib/kobject.c:78 [inline] kobject_add_internal+0x472/0x8d0 lib/kobject.c:240 kobject_add_varg lib/kobject.c:374 [inline] kobject_init_and_add+0x124/0x190 lib/kobject.c:457 netdev_queue_add_kobject net/core/net-sysfs.c:1706 [inline] netdev_queue_update_kobjects+0x1f3/0x480 net/core/net-sysfs.c:1758 register_queue_kobjects net/core/net-sysfs.c:1819 [inline] netdev_register_kobject+0x265/0x310 net/core/net-sysfs.c:2059 register_netdevice+0x1191/0x19c0 net/core/dev.c:10298 bond_newlink+0x3b/0x90 drivers/net/bonding/bond_netlink.c:576 rtnl_newlink_create net/core/rtnetlink.c:3506 [inline] __rtnl_newlink net/core/rtnetlink.c:3726 [inline] rtnl_newlink+0x158f/0x20a0 net/core/rtnetlink.c:3739 rtnetlink_rcv_msg+0x885/0x1040 net/core/rtnetlink.c:6606 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367 netlink_sendmsg+0xa3c/0xd70 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 __sys_sendto+0x3a4/0x4f0 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0xde/0x100 net/socket.c:2199 do_syscall_64+0xfb/0x240 entry_SYSCALL_64_after_hwframe+0x6d/0x75 RIP: 0033:0x7fc3fa87fa9c Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240212140700.2795436-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | geneve: use exit_batch_rtnl() methodEric Dumazet2024-02-081-13/+10
|/ | | | | | | | | | | | | | | | | | | | exit_batch_rtnl() is called while RTNL is held, and devices to be unregistered can be queued in the dev_kill_list. This saves one rtnl_lock()/rtnl_unlock() pair, and one unregister_netdevice_many() call. Note: it should be possible to remove the synchronize_net() call from geneve_sock_release() in a future patch. v4: move WARN_ON_ONCE(!list_empty(&gn->sock_list)) into geneve_exit_net(), after devices have been unregistered. (Antoine Tenart feedback) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Antoine Tenart <atenart@kernel.org> Link: https://lore.kernel.org/r/20240206144313.2050392-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* geneve: use DEV_STATS_INC()Eric Dumazet2024-01-061-12/+12
| | | | | | | | | | | geneve updates dev->stats fields locklessly. Adopt DEV_STATS_INC() to avoid races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240104163633.2070538-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* geneve: use generic function for tunnel IPv6 route lookupBeniamino Galvani2023-10-231-65/+31
| | | | | | | | | | | | | | The route lookup can be done now via generic function udp_tunnel6_dst_lookup() to replace the custom implementation in geneve_get_v6_dst(). This is similar to what already done for IPv4 in commit daa2ba7ed1d1 ("geneve: use generic function for tunnel IPv4 route lookup"). Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* geneve: use generic function for tunnel IPv4 route lookupBeniamino Galvani2023-10-161-66/+32
| | | | | | | | | | The route lookup can be done now via generic function udp_tunnel_dst_lookup() to replace the custom implementation in geneve_get_v4_rt(). Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* geneve: add dsfield helper functionBeniamino Galvani2023-10-161-11/+18
| | | | | | | | | | Add a helper function to compute the tos/dsfield. In this way, we can factor out some duplicate code. Also, the helper will be called from more places in the next commit. Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: geneve: accept every ethertypeJosef Miegl2023-03-211-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Geneve encapsulation, as defined in RFC 8926, has a Protocol Type field, which states the Ethertype of the payload appearing after the Geneve header. Commit 435fe1c0c1f7 ("net: geneve: support IPv4/IPv6 as inner protocol") introduced a new IFLA_GENEVE_INNER_PROTO_INHERIT flag that allowed the use of other Ethertypes than Ethernet. However, it did not get rid of a restriction that prohibits receiving payloads other than Ethernet, instead the commit white-listed additional Ethertypes, IPv4 and IPv6. This patch removes this restriction, making it possible to receive any Ethertype as a payload, if the IFLA_GENEVE_INNER_PROTO_INHERIT flag is set. The restriction was set in place back in commit 0b5e8b8eeae4 ("net: Add Geneve tunneling protocol driver"), which implemented a protocol layer driver for Geneve to be used with Open vSwitch. The relevant discussion about introducing the Ethertype white-list can be found here: https://lore.kernel.org/netdev/CAEP_g=_1q3ACX5NTHxLDnysL+dTMUVzdLpgw1apLKEdDSWPztw@mail.gmail.com/ <quote> >> + if (unlikely(geneveh->proto_type != htons(ETH_P_TEB))) > > Why? I thought the point of geneve carrying protocol field was to > allow protocols other than Ethernet... is this temporary maybe? Yes, it is temporary. Currently OVS only handles Ethernet packets but this restriction can be lifted once we have a consumer that is capable of handling other protocols. </quote> This white-list was then ported to a generic Geneve netdevice in commit 371bd1061d29 ("geneve: Consolidate Geneve functionality in single module."). Preserving the Ethertype white-list at this point made sense, as the Geneve device could send out only Ethernet payloads anyways. However, now that the Geneve netdevice supports encapsulating other payloads with IFLA_GENEVE_INNER_PROTO_INHERIT and we have a consumer capable of other protocols, it seems appropriate to lift the restriction and allow any Geneve payload to be received. Signed-off-by: Josef Miegl <josef@miegl.cz> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20230319220954.21834-1-josef@miegl.cz Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net: geneve: set IFF_POINTOPOINT with IFLA_GENEVE_INNER_PROTO_INHERITJosef Miegl2023-03-151-1/+1
| | | | | | | | | The GENEVE tunnel used with IFLA_GENEVE_INNER_PROTO_INHERIT is point-to-point, so set IFF_POINTOPOINT to reflect that. Signed-off-by: Josef Miegl <josef@miegl.cz> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* rtnetlink: pass netlink message header and portid to rtnl_configure_link()Hangbin Liu2022-11-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | This patch pass netlink message header and portid to rtnl_configure_link() All the functions in this call chain need to add the parameters so we can use them in the last call rtnl_notify(), and notify the userspace about the new link info if NLM_F_ECHO flag is set. - rtnl_configure_link() - __dev_notify_flags() - rtmsg_ifinfo() - rtmsg_ifinfo_event() - rtmsg_ifinfo_build_skb() - rtmsg_ifinfo_send() - rtnl_notify() Also move __dev_notify_flags() declaration to net/core/dev.h, as Jakub suggested. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: move from strlcpy with unused retval to strscpyWolfram Sang2022-08-311-2/+2
| | | | | | | | | | | | Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN Link: https://lore.kernel.org/r/20220830201457.7984-1-wsa+renesas@sang-engineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: gro: skb_gro_header helper functionRichard Gobert2022-08-251-6/+3
| | | | | | | | | | | | | | | | | | Introduce a simple helper function to replace a common pattern. When accessing the GRO header, we fetch the pointer from frag0, then test its validity and fetch it from the skb when necessary. This leads to the pattern skb_gro_header_fast -> skb_gro_header_hard -> skb_gro_header_slow recurring many times throughout GRO code. This patch replaces these patterns with a single inlined function call, improving code readability. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220823071034.GA56142@debian Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* geneve: do not use RT_TOS for IPv6 flowlabelMatthias May2022-08-101-2/+1
| | | | | | | | | | | | | | | | | | | According to Guillaume Nault RT_TOS should never be used for IPv6. Quote: RT_TOS() is an old macro used to interprete IPv4 TOS as described in the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4 code, although, given the current state of the code, most of the existing calls have no consequence. But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS" field to be interpreted the RFC 1349 way. There's no historical compatibility to worry about. Fixes: 3a56f86f1be6 ("geneve: handle ipv6 priority like ipv4 tos") Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Matthias May <matthias.may@westermo.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* geneve: fix TOS inheriting for ipv4Matthias May2022-08-101-4/+8
| | | | | | | | | | | | | | | | | | The current code retrieves the TOS field after the lookup on the ipv4 routing table. The routing process currently only allows routing based on the original 3 TOS bits, and not on the full 6 DSCP bits. As a result the retrieved TOS is cut to the 3 bits. However for inheriting purposes the full 6 bits should be used. Extract the full 6 bits before the route lookup and use that instead of the cut off 3 TOS bits. Fixes: e305ac6cf5a1 ("geneve: Add support to collect tunnel metadata.") Signed-off-by: Matthias May <matthias.may@westermo.com> Acked-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/20220805190006.8078-1-matthias.may@westermo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* geneve: Use ip_tunnel_key flow flags in route lookupsPaul Chaignon2022-07-261-0/+1
| | | | | | | | | | | | Use the new ip_tunnel_key field with the flow flags in the IPv4 route lookups for the encapsulated packet. This will be used by the bpf_skb_set_tunnel_key helper in the subsequent commit. Signed-off-by: Paul Chaignon <paul@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/fcc2e0eea01e8ea465a180126366ec20596ba530.1658759380.git.paul@isovalent.com
* geneve: avoid indirect calls in GRO path, when possiblePaolo Abeni2022-04-151-2/+8
| | | | | | | | | | In the most common setups, the geneve tunnels use an inner ethernet encapsulation. In the GRO path, when such condition is true, we can call directly the relevant GRO helper and avoid a few indirect calls. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: geneve: add missing netlink policy and size for ↵Eyal Birger2022-03-231-0/+3
| | | | | | | | | | | | IFLA_GENEVE_INNER_PROTO_INHERIT Add missing netlink attribute policy and size calculation. Also enable strict validation from this new attribute onwards. Fixes: 435fe1c0c1f7 ("net: geneve: support IPv4/IPv6 as inner protocol") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20220322043954.3042468-1-eyal.birger@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: geneve: support IPv4/IPv6 as inner protocolEyal Birger2022-03-171-19/+63
| | | | | | | | | | | | | | | This patch adds support for encapsulating IPv4/IPv6 within GENEVE. In order to use this, a new IFLA_GENEVE_INNER_PROTO_INHERIT flag needs to be provided at device creation. This property cannot be changed for the time being. In case IP traffic is received on a non-tun device the drop count is increased. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20220316061557.431872-1-eyal.birger@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net: dev: Makes sure netif_rx() can be invoked in any context.Sebastian Andrzej Siewior2022-02-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dave suggested a while ago (eleven years by now) "Let's make netif_rx() work in all contexts and get rid of netif_rx_ni()". Eric agreed and pointed out that modern devices should use netif_receive_skb() to avoid the overhead. In the meantime someone added another variant, netif_rx_any_context(), which behaves as suggested. netif_rx() must be invoked with disabled bottom halves to ensure that pending softirqs, which were raised within the function, are handled. netif_rx_ni() can be invoked only from process context (bottom halves must be enabled) because the function handles pending softirqs without checking if bottom halves were disabled or not. netif_rx_any_context() invokes on the former functions by checking in_interrupts(). netif_rx() could be taught to handle both cases (disabled and enabled bottom halves) by simply disabling bottom halves while invoking netif_rx_internal(). The local_bh_enable() invocation will then invoke pending softirqs only if the BH-disable counter drops to zero. Eric is concerned about the overhead of BH-disable+enable especially in regard to the loopback driver. As critical as this driver is, it will receive a shortcut to avoid the additional overhead which is not needed. Add a local_bh_disable() section in netif_rx() to ensure softirqs are handled if needed. Provide __netif_rx() which does not disable BH and has a lockdep assert to ensure that interrupts are disabled. Use this shortcut in the loopback driver and in drivers/net/*.c. Make netif_rx_ni() and netif_rx_any_context() invoke netif_rx() so they can be removed once they are no more users left. Link: https://lkml.kernel.org/r/20100415.020246.218622820.davem@davemloft.net Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* gro: remove rcu_read_lock/rcu_read_unlock from gro_complete handlersEric Dumazet2021-11-251-3/+0
| | | | | | | | | | All gro_complete() handlers are called from napi_gro_complete() while rcu_read_lock() has been called. There is no point stacking more rcu_read_lock() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* gro: remove rcu_read_lock/rcu_read_unlock from gro_receive handlersEric Dumazet2021-11-251-4/+1
| | | | | | | | | | All gro_receive() handlers are called from dev_gro_receive() while rcu_read_lock() has been called. There is no point stacking more rcu_read_lock() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: move gro definitions to include/net/gro.hEric Dumazet2021-11-161-0/+1
| | | | | | | | include/linux/netdevice.h became too big, move gro stuff into include/net/gro.h Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2021-04-261-2/+2
|\
| * net: geneve: modify IP header check in geneve6_xmit_skb and geneve_xmit_skbPhillip Potter2021-04-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modify the header size check in geneve6_xmit_skb and geneve_xmit_skb to use pskb_inet_may_pull rather than pskb_network_may_pull. This fixes two kernel selftest failures introduced by the commit introducing the checks: IPv4 over geneve6: PMTU exceptions IPv4 over geneve6: PMTU exceptions - nexthop objects It does this by correctly accounting for the fact that IPv4 packets may transit over geneve IPv6 tunnels (and vice versa), and still fixes the uninit-value bug fixed by the original commit. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: 6628ddfec758 ("net: geneve: check skb is large enough for IPv4/IPv6 header") Suggested-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2021-04-171-0/+6
|\| | | | | | | | | | | | | | | | | drivers/net/ethernet/stmicro/stmmac/stmmac_main.c - keep the ZC code, drop the code related to reinit net/bridge/netfilter/ebtables.c - fix build after move to net_generic Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net: geneve: check skb is large enough for IPv4/IPv6 headerPhillip Potter2021-04-121-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Check within geneve_xmit_skb/geneve6_xmit_skb that sk_buff structure is large enough to include IPv4 or IPv6 header, and reject if not. The geneve_xmit_skb portion and overall idea was contributed by Eric Dumazet. Fixes a KMSAN-found uninit-value bug reported by syzbot at: https://syzkaller.appspot.com/bug?id=abe95dc3e3e9667fc23b8d81f29ecad95c6f106f Suggested-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+2e406a9ac75bb71d4b7a@syzkaller.appspotmail.com Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2021-04-101-4/+20
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: MAINTAINERS - keep Chandrasekar drivers/net/ethernet/mellanox/mlx5/core/en_main.c - simple fix + trust the code re-added to param.c in -next is fine include/linux/bpf.h - trivial include/linux/ethtool.h - trivial, fix kdoc while at it include/linux/skmsg.h - move to relevant place in tcp.c, comment re-wrapped net/core/skmsg.c - add the sk = sk // sk = NULL around calls net/tipc/crypto.c - trivial Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * geneve: do not modify the shared tunnel info when PMTU triggers an ICMP replyAntoine Tenart2021-03-261-4/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the interface is part of a bridge or an Open vSwitch port and a packet exceed a PMTU estimate, an ICMP reply is sent to the sender. When using the external mode (collect metadata) the source and destination addresses are reversed, so that Open vSwitch can match the packet against an existing (reverse) flow. But inverting the source and destination addresses in the shared ip_tunnel_info will make following packets of the flow to use a wrong destination address (packets will be tunnelled to itself), if the flow isn't updated. Which happens with Open vSwitch, until the flow times out. Fixes this by uncloning the skb's ip_tunnel_info before inverting its source and destination addresses, so that the modification will only be made for the PTMU packet, not the following ones. Fixes: c1a800e88dbf ("geneve: Support for PMTU discovery on directly bridged links") Tested-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | geneve: allow UDP L4 GRO passthrouPaolo Abeni2021-03-311-0/+1
|/ | | | | | | | Similar to the previous commit, let even geneve passthrou the L4 GRO packets Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* geneve: add NETIF_F_FRAGLIST flag for dev featuresXin Long2021-01-191-2/+3
| | | | | | | | | | | | | Some protocol HW GSO requires fraglist supported by the device, like SCTP. Without NETIF_F_FRAGLIST set in the dev features of geneve, it would have to do SW GSO before the packets enter the driver, even when the geneve dev and lower dev (like veth) both have the feature of NETIF_F_GSO_SCTP. So this patch is to add it for geneve. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* udp_tunnel: remove REGISTER/UNREGISTER handling from tunnel driversJakub Kicinski2021-01-071-10/+4
| | | | | | | | | | udp_tunnel_nic handles REGISTER and UNREGISTER event, now that all drivers use that infra we can drop the event handling in the tunnel drivers. Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2020-12-121-16/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | xdp_return_frame_bulk() needs to pass a xdp_buff to __xdp_return(). strlcpy got converted to strscpy but here it makes no functional difference, so just keep the right code. Conflicts: net/netfilter/nf_tables_api.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * Revert "geneve: pull IP header before ECN decapsulation"Jakub Kicinski2020-12-091-16/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 4179b00c04d1 ("geneve: pull IP header before ECN decapsulation"). Eric says: "network header should have been pulled already before hitting geneve_rx()". Let's revert the syzbot fix since it's causing more harm than good, and revisit. Suggested-by: Eric Dumazet <edumazet@google.com> Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: 4179b00c04d1 ("geneve: pull IP header before ECN decapsulation") Link: https://bugzilla.kernel.org/show_bug.cgi?id=210569 Link: https://lore.kernel.org/netdev/CANn89iJVWfb=2i7oU1=D55rOyQnBbbikf+Mc6XHMkY7YX-yGEw@mail.gmail.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2020-12-041-4/+16
|\| | | | | | | | | | | | | Conflicts: drivers/net/ethernet/ibm/ibmvnic.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * geneve: pull IP header before ECN decapsulationEric Dumazet2020-12-021-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IP_ECN_decapsulate() and IP6_ECN_decapsulate() assume IP header is already pulled. geneve does not ensure this yet. Fixing this generically in IP_ECN_decapsulate() and IP6_ECN_decapsulate() is not possible, since callers pass a pointer that might be freed by pskb_may_pull() syzbot reported : BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:238 [inline] BUG: KMSAN: uninit-value in INET_ECN_decapsulate+0x345/0x1db0 include/net/inet_ecn.h:260 CPU: 1 PID: 8941 Comm: syz-executor.0 Not tainted 5.10.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x21c/0x280 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197 __INET_ECN_decapsulate include/net/inet_ecn.h:238 [inline] INET_ECN_decapsulate+0x345/0x1db0 include/net/inet_ecn.h:260 geneve_rx+0x2103/0x2980 include/net/inet_ecn.h:306 geneve_udp_encap_recv+0x105c/0x1340 drivers/net/geneve.c:377 udp_queue_rcv_one_skb+0x193a/0x1af0 net/ipv4/udp.c:2093 udp_queue_rcv_skb+0x282/0x1050 net/ipv4/udp.c:2167 udp_unicast_rcv_skb net/ipv4/udp.c:2325 [inline] __udp4_lib_rcv+0x399d/0x5880 net/ipv4/udp.c:2394 udp_rcv+0x5c/0x70 net/ipv4/udp.c:2564 ip_protocol_deliver_rcu+0x572/0xc50 net/ipv4/ip_input.c:204 ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip_local_deliver+0x583/0x8d0 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:449 [inline] ip_rcv_finish net/ipv4/ip_input.c:428 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip_rcv+0x5c3/0x840 net/ipv4/ip_input.c:539 __netif_receive_skb_one_core net/core/dev.c:5315 [inline] __netif_receive_skb+0x1ec/0x640 net/core/dev.c:5429 process_backlog+0x523/0xc10 net/core/dev.c:6319 napi_poll+0x420/0x1010 net/core/dev.c:6763 net_rx_action+0x35c/0xd40 net/core/dev.c:6833 __do_softirq+0x1a9/0x6fa kernel/softirq.c:298 asm_call_irq_on_stack+0xf/0x20 </IRQ> __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline] do_softirq_own_stack+0x6e/0x90 arch/x86/kernel/irq_64.c:77 do_softirq kernel/softirq.c:343 [inline] __local_bh_enable_ip+0x184/0x1d0 kernel/softirq.c:195 local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32 rcu_read_unlock_bh include/linux/rcupdate.h:730 [inline] __dev_queue_xmit+0x3a9b/0x4520 net/core/dev.c:4167 dev_queue_xmit+0x4b/0x60 net/core/dev.c:4173 packet_snd net/packet/af_packet.c:2992 [inline] packet_sendmsg+0x86f9/0x99d0 net/packet/af_packet.c:3017 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] __sys_sendto+0x9dc/0xc80 net/socket.c:1992 __do_sys_sendto net/socket.c:2004 [inline] __se_sys_sendto+0x107/0x130 net/socket.c:2000 __x64_sys_sendto+0x6e/0x90 net/socket.c:2000 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20201201090507.4137906-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: don't include ethtool.h from netdevice.hJakub Kicinski2020-11-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | linux/netdevice.h is included in very many places, touching any of its dependecies causes large incremental builds. Drop the linux/ethtool.h include, linux/netdevice.h just needs a forward declaration of struct ethtool_ops. Fix all the places which made use of this implicit include. Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Link: https://lore.kernel.org/r/20201120225052.1427503-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2020-11-201-2/+1
|\| | | | | | | Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * ip_tunnels: Set tunnel option flag when tunnel metadata is presentYi-Hung Wei2020-11-141-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we may set the tunnel option flag when the size of metadata is zero. For example, we set TUNNEL_GENEVE_OPT in the receive function no matter the geneve option is present or not. As this may result in issues on the tunnel flags consumers, this patch fixes the issue. Related discussion: * https://lore.kernel.org/netdev/1604448694-19351-1-git-send-email-yihung.wei@gmail.com/T/#u Fixes: 256c87c17c53 ("net: check tunnel option type in tunnel flags") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Link: https://lore.kernel.org/r/1605053800-74072-1-git-send-email-yihung.wei@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: switch to dev_get_tstats64Heiner Kallweit2020-11-101-1/+1
|/ | | | | | | | Replace ip_tunnel_get_stats64() with the new identical core function dev_get_tstats64(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>