summaryrefslogtreecommitdiffstats
path: root/net/ipv4 (follow)
Commit message (Collapse)AuthorAgeFilesLines
* net: Make locking in sock_bindtoindex optionalFerenc Fejes2020-06-011-1/+1
| | | | | | | | | | | | | The sock_bindtoindex intended for kernel wide usage however it will lock the socket regardless of the context. This modification relax this behavior optionally: locking the socket will be optional by calling the sock_bindtoindex with lock_sk = true. The modification applied to all users of the sock_bindtoindex. Signed-off-by: Ferenc Fejes <fejes@inf.elte.hu> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/bee6355da40d9e991b2f2d12b67d55ebb5f5b207.1590871065.git.fejes@inf.elte.hu
* ipv4: nexthop: Fix deadcode issue by performing a proper NULL checkPatrick Eigensatz2020-06-011-2/+2
| | | | | | | | | | | | | | After allocating the spare nexthop group it should be tested for kzalloc() returning NULL, instead the already used nexthop group (which cannot be NULL at this point) had been tested so far. Additionally, if kzalloc() fails, return ERR_PTR(-ENOMEM) instead of NULL. Coverity-id: 1463885 Reported-by: Coverity <scan-admin@coverity.com> Signed-off-by: Patrick Eigensatz <patrickeigensatz@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2020-06-017-81/+152
|\ | | | | | | | | | | | | | | | | | | | | | | | | xdp_umem.c had overlapping changes between the 64-bit math fix for the calculation of npgs and the removal of the zerocopy memory type which got rid of the chunk_size_nohdr member. The mlx5 Kconfig conflict is a case where we just take the net-next copy of the Kconfig entry dependency as it takes on the ESWITCH dependency by one level of indirection which is what the 'net' conflicting change is trying to ensure. Signed-off-by: David S. Miller <davem@davemloft.net>
| * devinet: fix memleak in inetdev_init()Yang Yingliang2020-05-311-0/+1
| | | | | | | | | | | | | | | | | | | | When devinet_sysctl_register() failed, the memory allocated in neigh_parms_alloc() should be freed. Fixes: 20e61da7ffcf ("ipv4: fail early when creating netdev named all or default") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'master' of ↵David S. Miller2020-05-292-13/+40
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2020-05-29 1) Several fixes for ESP gro/gso in transport and beet mode when IPv6 extension headers are present. From Xin Long. 2) Fix a wrong comment on XFRMA_OFFLOAD_DEV. From Antony Antony. 3) Fix sk_destruct callback handling on ESP in TCP encapsulation. From Sabrina Dubroca. 4) Fix a use after free in xfrm_output_gso when used with vxlan. From Xin Long. 5) Fix secpath handling of VTI when used wiuth IPCOMP. From Xin Long. 6) Fix an oops when deleting a x-netns xfrm interface. From Nicolas Dichtel. 7) Fix a possible warning on policy updates. We had a case where it was possible to add two policies with the same lookup keys. From Xin Long. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * esp4: improve xfrm4_beet_gso_segment() to be more readableXin Long2020-05-181-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | This patch is to improve the code to make xfrm4_beet_gso_segment() more readable, and keep consistent with xfrm6_beet_gso_segment(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * ip_vti: receive ipip packet by calling ip_tunnel_rcvXin Long2020-04-231-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In Commit dd9ee3444014 ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel"), it tries to receive IPIP packets in vti by calling xfrm_input(). This case happens when a small packet or frag sent by peer is too small to get compressed. However, xfrm_input() will still get to the IPCOMP path where skb sec_path is set, but never dropped while it should have been done in vti_ipcomp4_protocol.cb_handler(vti_rcv_cb), as it's not an ipcomp4 packet. This will cause that the packet can never pass xfrm4_policy_check() in the upper protocol rcv functions. So this patch is to call ip_tunnel_rcv() to process IPIP packets instead. Fixes: dd9ee3444014 ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * esp4: support ipv6 nexthdrs process for beet gso segmentXin Long2020-04-211-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For beet mode, when it's ipv6 inner address with nexthdrs set, the packet format might be: ---------------------------------------------------- | outer | | dest | | | ESP | ESP | | IP hdr | ESP | opts.| TCP | Data | Trailer | ICV | ---------------------------------------------------- Before doing gso segment in xfrm4_beet_gso_segment(), the same thing is needed as it does in xfrm6_beet_gso_segment() in last patch 'esp6: support ipv6 nexthdrs process for beet gso segment'. v1->v2: - remove skb_transport_offset(), as it will always return 0 in xfrm6_beet_gso_segment(), thank Sabrina's check. Fixes: 384a46ea7bdc ("esp4: add gso_segment for esp4 beet mode") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * xfrm: remove the xfrm_state_put call becofe going to out_resetXin Long2020-04-201-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This xfrm_state_put call in esp4/6_gro_receive() will cause double put for state, as in out_reset path secpath_reset() will put all states set in skb sec_path. So fix it by simply remove the xfrm_state_put call. Fixes: 6ed69184ed9c ("xfrm: Reset secpath in xfrm failure") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | ipv4: nexthop version of fib_info_nh_uses_devDavid Ahern2020-05-271-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to the last path, need to fix fib_info_nh_uses_dev for external nexthops to avoid referencing multiple nh_grp structs. Move the device check in fib_info_nh_uses_dev to a helper and create a nexthop version that is called if the fib_info uses an external nexthop. Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv4: Refactor nhc evaluation in fib_table_lookupDavid Ahern2020-05-271-15/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | FIB lookups can return an entry that references an external nexthop. While walking the nexthop struct we do not want to make multiple calls into the nexthop code which can result in 2 different structs getting accessed - one returning the number of paths the rest of the loop seeing a different nh_grp struct. If the nexthop group shrunk, the result is an attempt to access a fib_nh_common that does not exist for the new nh_grp struct but did for the old one. To fix that move the device evaluation code to a helper that can be used for inline fib_nh path as well as external nexthops. Update the existing check for fi->nh in fib_table_lookup to call a new helper, nexthop_get_nhc_lookup, which walks the external nexthop with a single rcu dereference. Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | nexthops: don't modify published nexthop groupsNikolay Aleksandrov2020-05-271-33/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We must avoid modifying published nexthop groups while they might be in use, otherwise we might see NULL ptr dereferences. In order to do that we allocate 2 nexthoup group structures upon nexthop creation and swap between them when we have to delete an entry. The reason is that we can't fail nexthop group removal, so we can't handle allocation failure thus we move the extra allocation on creation where we can safely fail and return ENOMEM. Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | nexthops: Move code from remove_nexthop_from_groups to remove_nh_grp_entryDavid Ahern2020-05-271-14/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move nh_grp dereference and check for removing nexthop group due to all members gone into remove_nh_grp_entry. Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: nf_conntrack_pptp: prevent buffer overflows in debug codePablo Neira Ayuso2020-05-251-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dan Carpenter says: "Smatch complains that the value for "cmd" comes from the network and can't be trusted." Add pptp_msg_name() helper function that checks for the array boundary. Fixes: f09943fefe6b ("[NETFILTER]: nf_conntrack/nf_nat: add PPTP helper port") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | tcp: tcp_init_buffer_space can be staticFlorian Westphal2020-05-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of commit 98fa6271cfcb ("tcp: refactor setting the initial congestion window") this is called only from tcp_input.c, so it can be static. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'master' of ↵David S. Miller2020-05-295-91/+12
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2020-05-29 1) Add IPv6 encapsulation support for ESP over UDP and TCP. From Sabrina Dubroca. 2) Remove unneeded reference when initializing xfrm interfaces. From Nicolas Dichtel. 3) Remove some indirect calls from the state_afinfo. From Florian Westphal. Please note that this pull request has two merge conflicts between commit: 0c922a4850eb ("xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finish") from Linus' tree and commit: 2ab6096db2f1 ("xfrm: remove output_finish indirection from xfrm_state_afinfo") from the ipsec-next tree. and between commit: 3986912f6a9a ("ipv6: move SIOCADDRT and SIOCDELRT handling into ->compat_ioctl") from the net-next tree and commit: 0146dca70b87 ("xfrm: add support for UDPv6 encapsulation of ESP") from the ipsec-next tree. Both conflicts can be resolved as done in linux-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | xfrm: fix unused variable warning if CONFIG_NETFILTER=nFlorian Westphal2020-05-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After recent change 'x' is only used when CONFIG_NETFILTER is set: net/ipv4/xfrm4_output.c: In function '__xfrm4_output': net/ipv4/xfrm4_output.c:19:21: warning: unused variable 'x' [-Wunused-variable] 19 | struct xfrm_state *x = skb_dst(skb)->xfrm; Expand the CONFIG_NETFILTER scope to avoid this. Fixes: 2ab6096db2f1 ("xfrm: remove output_finish indirection from xfrm_state_afinfo") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | | xfrm: remove output_finish indirection from xfrm_state_afinfoFlorian Westphal2020-05-062-23/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are only two implementaions, one for ipv4 and one for ipv6. Both are almost identical, they clear skb->cb[], set the TRANSFORMED flag in IP(6)CB and then call the common xfrm_output() function. By placing the IPCB handling into the common function, we avoid the need for the output_finish indirection as the output functions can simply use xfrm_output(). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | | xfrm: move xfrm4_extract_header to common helperFlorian Westphal2020-05-061-21/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function only initializes the XFRM CB in the skb. After previous patch xfrm4_extract_header is only called from net/xfrm/xfrm_{input,output}.c. Because of IPV6=m linker errors the ipv6 equivalent (xfrm6_extract_header) was already placed in xfrm_inout.h because we can't call functions residing in a module from the core. So do the same for the ipv4 helper and place it next to the ipv6 one. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | | xfrm: state: remove extract_input indirection from xfrm_state_afinfoFlorian Westphal2020-05-062-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to keep CONFIG_IPV6=m working, xfrm6_extract_header needs to be duplicated. It will be removed again in a followup change when the remaining caller is moved to net/xfrm as well. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | | xfrm: avoid extract_output indirection for ipv4Florian Westphal2020-05-062-41/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can use a direct call for ipv4, so move the needed functions to net/xfrm/xfrm_output.c and call them directly. For ipv6 the indirection can be avoided as well but it will need a bit more work -- to ease review it will be done in another patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | | xfrm: add IPv6 support for espintcpSabrina Dubroca2020-04-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This extends espintcp to support IPv6, building on the existing code and the new UDPv6 encapsulation support. Most of the code is either reused directly (stream parser, ULP) or very similar to the IPv4 variant (net/ipv6/esp6.c changes). The separation of config options for IPv4 and IPv6 espintcp requires a bit of Kconfig gymnastics to enable the core code. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | | xfrm: add support for UDPv6 encapsulation of ESPSabrina Dubroca2020-04-281-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for encapsulation of ESP over UDPv6. The code is very similar to the IPv4 encapsulation implementation, and allows to easily add espintcp on IPv6 as a follow-up. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* | | | ipv4: add ip_sock_set_pktinfoChristoph Hellwig2020-05-281-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper to directly set the IP_PKTINFO sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | ipv4: add ip_sock_set_mtu_discoverChristoph Hellwig2020-05-281-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper to directly set the IP_MTU_DISCOVER sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> [rxrpc bits] Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | ipv4: add ip_sock_set_recverrChristoph Hellwig2020-05-281-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper to directly set the IP_RECVERR sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | ipv4: add ip_sock_set_freebindChristoph Hellwig2020-05-281-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper to directly set the IP_FREEBIND sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | ipv4: add ip_sock_set_tosChristoph Hellwig2020-05-281-9/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper to directly set the IP_TOS sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: add tcp_sock_set_keepcntChristoph Hellwig2020-05-281-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper to directly set the TCP_KEEPCNT sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: add tcp_sock_set_keepintvlChristoph Hellwig2020-05-281-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper to directly set the TCP_KEEPINTVL sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: add tcp_sock_set_keepidleChristoph Hellwig2020-05-281-15/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper to directly set the TCP_KEEP_IDLE sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: add tcp_sock_set_user_timeoutChristoph Hellwig2020-05-281-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper to directly set the TCP_USER_TIMEOUT sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: add tcp_sock_set_syncntChristoph Hellwig2020-05-281-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper to directly set the TCP_SYNCNT sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: add tcp_sock_set_quickackChristoph Hellwig2020-05-281-13/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper to directly set the TCP_QUICKACK sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: add tcp_sock_set_nodelayChristoph Hellwig2020-05-281-14/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper to directly set the TCP_NODELAY sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: add tcp_sock_set_corkChristoph Hellwig2020-05-281-19/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper to directly set the TCP_CORK sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: add sock_bindtoindexChristoph Hellwig2020-05-281-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper to directly set the SO_BINDTOIFINDEX sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: ipv6: support RFC 6069 (TCP-LD)Eric Dumazet2020-05-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make tcp_ld_RTO_revert() helper available to IPv6, and implement RFC 6069 : Quoting this RFC : 3. Connectivity Disruption Indication For Internet Protocol version 6 (IPv6) [RFC2460], the counterpart of the ICMP destination unreachable message of code 0 (net unreachable) and of code 1 (host unreachable) is the ICMPv6 destination unreachable message of code 0 (no route to destination) [RFC4443]. As with IPv4, a router should generate an ICMPv6 destination unreachable message of code 0 in response to a packet that cannot be delivered to its destination address because it lacks a matching entry in its routing table. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: rename tcp_v4_err() skb parameterEric Dumazet2020-05-271-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This essentially reverts 4d1a2d9ec1c1 ("Revert Backoff [v3]: Rename skb to icmp_skb in tcp_v4_err()") Now we have tcp_ld_RTO_revert() helper, we can use the usual name for sk_buff parameter, so that tcp_v4_err() and tcp_v6_err() use similar names. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: add tcp_ld_RTO_revert() helperEric Dumazet2020-05-271-40/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RFC 6069 logic has been implemented for IPv4 only so far, right in the middle of tcp_v4_err() and was error prone. Move this code to one helper, to make tcp_v4_err() more readable and to eventually expand RFC 6069 to IPv6 in the future. Also perform sock_owned_by_user() check a bit sooner. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | nexthop: Fix type of event_type in call_nexthop_notifiersNathan Chancellor2020-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clang warns: net/ipv4/nexthop.c:841:30: warning: implicit conversion from enumeration type 'enum nexthop_event_type' to different enumeration type 'enum fib_event_type' [-Wenum-conversion] call_nexthop_notifiers(net, NEXTHOP_EVENT_DEL, nh); ~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~~~~~~~ 1 warning generated. Use the right type for event_type so that clang does not warn. Fixes: 8590ceedb701 ("nexthop: add support for notifiers") Link: https://github.com/ClangBuiltLinux/linux/issues/1038 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: tcp_v4_err() icmp skb is named icmp_skbEric Dumazet2020-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I missed the fact that tcp_v4_err() differs from tcp_v6_err(). After commit 4d1a2d9ec1c1 ("Rename skb to icmp_skb in tcp_v4_err()") the skb argument has been renamed to icmp_skb only in one function. I will in a future patch reconciliate these functions to avoid this kind of confusion. Fixes: 45af29ca761c ("tcp: allow traceroute -Mtcp for unpriv users") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: allow traceroute -Mtcp for unpriv usersEric Dumazet2020-05-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unpriv users can use traceroute over plain UDP sockets, but not TCP ones. $ traceroute -Mtcp 8.8.8.8 You do not have enough privileges to use this traceroute method. $ traceroute -n -Mudp 8.8.8.8 traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets 1 192.168.86.1 3.631 ms 3.512 ms 3.405 ms 2 10.1.10.1 4.183 ms 4.125 ms 4.072 ms 3 96.120.88.125 20.621 ms 19.462 ms 20.553 ms 4 96.110.177.65 24.271 ms 25.351 ms 25.250 ms 5 69.139.199.197 44.492 ms 43.075 ms 44.346 ms 6 68.86.143.93 27.969 ms 25.184 ms 25.092 ms 7 96.112.146.18 25.323 ms 96.112.146.22 25.583 ms 96.112.146.26 24.502 ms 8 72.14.239.204 24.405 ms 74.125.37.224 16.326 ms 17.194 ms 9 209.85.251.9 18.154 ms 209.85.247.55 14.449 ms 209.85.251.9 26.296 ms^C We can easily support traceroute over TCP, by queueing an error message into socket error queue. Note that applications need to set IP_RECVERR/IPV6_RECVERR option to enable this feature, and that the error message is only queued while in SYN_SNT state. socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 3 setsockopt(3, SOL_IPV6, IPV6_RECVERR, [1], 4) = 0 setsockopt(3, SOL_SOCKET, SO_TIMESTAMP_OLD, [1], 4) = 0 setsockopt(3, SOL_IPV6, IPV6_UNICAST_HOPS, [5], 4) = 0 connect(3, {sa_family=AF_INET6, sin6_port=htons(8787), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2002:a05:6608:297::", &sin6_addr), sin6_scope_id=0}, 28) = -1 EHOSTUNREACH (No route to host) recvmsg(3, {msg_name={sa_family=AF_INET6, sin6_port=htons(8787), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2002:a05:6608:297::", &sin6_addr), sin6_scope_id=0}, msg_namelen=1024->28, msg_iov=[{iov_base="`\r\337\320\0004\6\1&\7\370\260\200\231\16\27\0\0\0\0\0\0\0\0 \2\n\5f\10\2\227"..., iov_len=1024}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1590340680, tv_usec=272424}}, {cmsg_len=60, cmsg_level=SOL_IPV6, cmsg_type=IPV6_RECVERR}], msg_controllen=96, msg_flags=MSG_ERRQUEUE}, MSG_ERRQUEUE) = 144 Suggested-by: Maciej Żenczykowski <maze@google.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | ipv4: potential underflow in compat_ip_setsockopt()Dan Carpenter2020-05-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The value of "n" is capped at 0x1ffffff but it checked for negative values. I don't think this causes a problem but I'm not certain and it's harmless to prevent it. Fixes: 2e04172875c9 ("ipv4: do compat setsockopt for MCAST_MSFILTER directly") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2020-05-246-32/+35
|\ \ \ \ | | |/ / | |/| | | | | | | | | | | | | | | | | | The MSCC bug fix in 'net' had to be slightly adjusted because the register accesses are done slightly differently in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: don't return invalid table id error when we fall back to PF_UNSPECSabrina Dubroca2020-05-222-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case we can't find a ->dumpit callback for the requested (family,type) pair, we fall back to (PF_UNSPEC,type). In effect, we're in the same situation as if userspace had requested a PF_UNSPEC dump. For RTM_GETROUTE, that handler is rtnl_dump_all, which calls all the registered RTM_GETROUTE handlers. The requested table id may or may not exist for all of those families. commit ae677bbb4441 ("net: Don't return invalid table id error when dumping all families") fixed the problem when userspace explicitly requests a PF_UNSPEC dump, but missed the fallback case. For example, when we pass ipv6.disable=1 to a kernel with CONFIG_IP_MROUTE=y and CONFIG_IP_MROUTE_MULTIPLE_TABLES=y, the (PF_INET6, RTM_GETROUTE) handler isn't registered, so we end up in rtnl_dump_all, and listing IPv6 routes will unexpectedly print: # ip -6 r Error: ipv4: MR table does not exist. Dump terminated commit ae677bbb4441 introduced the dump_all_families variable, which gets set when userspace requests a PF_UNSPEC dump. However, we can't simply set the family to PF_UNSPEC in rtnetlink_rcv_msg in the fallback case to get dump_all_families == true, because some messages types (for example RTM_GETRULE and RTM_GETNEIGH) only register the PF_UNSPEC handler and use the family to filter in the kernel what is dumped to userspace. We would then export more entries, that userspace would have to filter. iproute does that, but other programs may not. Instead, this patch removes dump_all_families and updates the RTM_GETROUTE handlers to check if the family that is being dumped is their own. When it's not, which covers both the intentional PF_UNSPEC dumps (as dump_all_families did) and the fallback case, ignore the missing table id error. Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: ipip: fix wrong address family in init error pathVadim Fedorenko2020-05-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of error with MPLS support the code is misusing AF_INET instead of AF_MPLS. Fixes: 1b69e7e6c4da ("ipip: support MPLS over IPv4") Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: nlmsg_cancel() if put fails for nhmsgStephen Worley2020-05-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes data remnant seen when we fail to reserve space for a nexthop group during a larger dump. If we fail the reservation, we goto nla_put_failure and cancel the message. Reproduce with the following iproute2 commands: ===================== ip link add dummy1 type dummy ip link add dummy2 type dummy ip link add dummy3 type dummy ip link add dummy4 type dummy ip link add dummy5 type dummy ip link add dummy6 type dummy ip link add dummy7 type dummy ip link add dummy8 type dummy ip link add dummy9 type dummy ip link add dummy10 type dummy ip link add dummy11 type dummy ip link add dummy12 type dummy ip link add dummy13 type dummy ip link add dummy14 type dummy ip link add dummy15 type dummy ip link add dummy16 type dummy ip link add dummy17 type dummy ip link add dummy18 type dummy ip link add dummy19 type dummy ip link add dummy20 type dummy ip link add dummy21 type dummy ip link add dummy22 type dummy ip link add dummy23 type dummy ip link add dummy24 type dummy ip link add dummy25 type dummy ip link add dummy26 type dummy ip link add dummy27 type dummy ip link add dummy28 type dummy ip link add dummy29 type dummy ip link add dummy30 type dummy ip link add dummy31 type dummy ip link add dummy32 type dummy ip link set dummy1 up ip link set dummy2 up ip link set dummy3 up ip link set dummy4 up ip link set dummy5 up ip link set dummy6 up ip link set dummy7 up ip link set dummy8 up ip link set dummy9 up ip link set dummy10 up ip link set dummy11 up ip link set dummy12 up ip link set dummy13 up ip link set dummy14 up ip link set dummy15 up ip link set dummy16 up ip link set dummy17 up ip link set dummy18 up ip link set dummy19 up ip link set dummy20 up ip link set dummy21 up ip link set dummy22 up ip link set dummy23 up ip link set dummy24 up ip link set dummy25 up ip link set dummy26 up ip link set dummy27 up ip link set dummy28 up ip link set dummy29 up ip link set dummy30 up ip link set dummy31 up ip link set dummy32 up ip link set dummy33 up ip link set dummy34 up ip link set vrf-red up ip link set vrf-blue up ip link set dummyVRFred up ip link set dummyVRFblue up ip ro add 1.1.1.1/32 dev dummy1 ip ro add 1.1.1.2/32 dev dummy2 ip ro add 1.1.1.3/32 dev dummy3 ip ro add 1.1.1.4/32 dev dummy4 ip ro add 1.1.1.5/32 dev dummy5 ip ro add 1.1.1.6/32 dev dummy6 ip ro add 1.1.1.7/32 dev dummy7 ip ro add 1.1.1.8/32 dev dummy8 ip ro add 1.1.1.9/32 dev dummy9 ip ro add 1.1.1.10/32 dev dummy10 ip ro add 1.1.1.11/32 dev dummy11 ip ro add 1.1.1.12/32 dev dummy12 ip ro add 1.1.1.13/32 dev dummy13 ip ro add 1.1.1.14/32 dev dummy14 ip ro add 1.1.1.15/32 dev dummy15 ip ro add 1.1.1.16/32 dev dummy16 ip ro add 1.1.1.17/32 dev dummy17 ip ro add 1.1.1.18/32 dev dummy18 ip ro add 1.1.1.19/32 dev dummy19 ip ro add 1.1.1.20/32 dev dummy20 ip ro add 1.1.1.21/32 dev dummy21 ip ro add 1.1.1.22/32 dev dummy22 ip ro add 1.1.1.23/32 dev dummy23 ip ro add 1.1.1.24/32 dev dummy24 ip ro add 1.1.1.25/32 dev dummy25 ip ro add 1.1.1.26/32 dev dummy26 ip ro add 1.1.1.27/32 dev dummy27 ip ro add 1.1.1.28/32 dev dummy28 ip ro add 1.1.1.29/32 dev dummy29 ip ro add 1.1.1.30/32 dev dummy30 ip ro add 1.1.1.31/32 dev dummy31 ip ro add 1.1.1.32/32 dev dummy32 ip next add id 1 via 1.1.1.1 dev dummy1 ip next add id 2 via 1.1.1.2 dev dummy2 ip next add id 3 via 1.1.1.3 dev dummy3 ip next add id 4 via 1.1.1.4 dev dummy4 ip next add id 5 via 1.1.1.5 dev dummy5 ip next add id 6 via 1.1.1.6 dev dummy6 ip next add id 7 via 1.1.1.7 dev dummy7 ip next add id 8 via 1.1.1.8 dev dummy8 ip next add id 9 via 1.1.1.9 dev dummy9 ip next add id 10 via 1.1.1.10 dev dummy10 ip next add id 11 via 1.1.1.11 dev dummy11 ip next add id 12 via 1.1.1.12 dev dummy12 ip next add id 13 via 1.1.1.13 dev dummy13 ip next add id 14 via 1.1.1.14 dev dummy14 ip next add id 15 via 1.1.1.15 dev dummy15 ip next add id 16 via 1.1.1.16 dev dummy16 ip next add id 17 via 1.1.1.17 dev dummy17 ip next add id 18 via 1.1.1.18 dev dummy18 ip next add id 19 via 1.1.1.19 dev dummy19 ip next add id 20 via 1.1.1.20 dev dummy20 ip next add id 21 via 1.1.1.21 dev dummy21 ip next add id 22 via 1.1.1.22 dev dummy22 ip next add id 23 via 1.1.1.23 dev dummy23 ip next add id 24 via 1.1.1.24 dev dummy24 ip next add id 25 via 1.1.1.25 dev dummy25 ip next add id 26 via 1.1.1.26 dev dummy26 ip next add id 27 via 1.1.1.27 dev dummy27 ip next add id 28 via 1.1.1.28 dev dummy28 ip next add id 29 via 1.1.1.29 dev dummy29 ip next add id 30 via 1.1.1.30 dev dummy30 ip next add id 31 via 1.1.1.31 dev dummy31 ip next add id 32 via 1.1.1.32 dev dummy32 i=100 while [ $i -le 200 ] do ip next add id $i group 1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19 echo $i ((i++)) done ip next add id 999 group 1/2/3/4/5/6 ip next ls ======================== Fixes: ab84be7e54fc ("net: Initial nexthop code") Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: inet_csk: Fix so_reuseport bind-address cache in tb->fast*Martin KaFai Lau2020-05-201-19/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 637bc8bbe6c0 ("inet: reset tb->fastreuseport when adding a reuseport sk") added a bind-address cache in tb->fast*. The tb->fast* caches the address of a sk which has successfully been binded with SO_REUSEPORT ON. The idea is to avoid the expensive conflict search in inet_csk_bind_conflict(). There is an issue with wildcard matching where sk_reuseport_match() should have returned false but it is currently returning true. It ends up hiding bind conflict. For example, bind("[::1]:443"); /* without SO_REUSEPORT. Succeed. */ bind("[::2]:443"); /* with SO_REUSEPORT. Succeed. */ bind("[::]:443"); /* with SO_REUSEPORT. Still Succeed where it shouldn't */ The last bind("[::]:443") with SO_REUSEPORT on should have failed because it should have a conflict with the very first bind("[::1]:443") which has SO_REUSEPORT off. However, the address "[::2]" is cached in tb->fast* in the second bind. In the last bind, the sk_reuseport_match() returns true because the binding sk's wildcard addr "[::]" matches with the "[::2]" cached in tb->fast*. The correct bind conflict is reported by removing the second bind such that tb->fast* cache is not involved and forces the bind("[::]:443") to go through the inet_csk_bind_conflict(): bind("[::1]:443"); /* without SO_REUSEPORT. Succeed. */ bind("[::]:443"); /* with SO_REUSEPORT. -EADDRINUSE */ The expected behavior for sk_reuseport_match() is, it should only allow the "cached" tb->fast* address to be used as a wildcard match but not the address of the binding sk. To do that, the current "bool match_wildcard" arg is split into "bool match_sk1_wildcard" and "bool match_sk2_wildcard". This change only affects the sk_reuseport_match() which is only used by inet_csk (e.g. TCP). The other use cases are calling inet_rcv_saddr_equal() and this patch makes it pass the same "match_wildcard" arg twice to the "ipv[46]_rcv_saddr_equal(..., match_wildcard, match_wildcard)". Cc: Josef Bacik <jbacik@fb.com> Fixes: 637bc8bbe6c0 ("inet: reset tb->fastreuseport when adding a reuseport sk") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: revert "net: get rid of an signed integer overflow in ip_idents_reserve()"Yuqi Jin2020-05-171-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit adb03115f459 ("net: get rid of an signed integer overflow in ip_idents_reserve()") used atomic_cmpxchg to replace "atomic_add_return" inside the function "ip_idents_reserve". The reason was to avoid UBSAN warning. However, this change has caused performance degrade and in GCC-8, fno-strict-overflow is now mapped to -fwrapv -fwrapv-pointer and signed integer overflow is now undefined by default at all optimization levels[1]. Moreover, it was a bug in UBSAN vs -fwrapv /-fno-strict-overflow, so Let's revert it safely. [1] https://gcc.gnu.org/gcc-8/changes.html Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Jiong Wang <jiongwang@huawei.com> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: David S. Miller <davem@davemloft.net>