linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	bpf: Add redirect_peer helper	Daniel Borkmann	2020-10-11	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add an efficient ingress to ingress netns switch that can be used out of tc BPF programs in order to redirect traffic from host ns ingress into a container veth device ingress without having to go via CPU backlog queue [0]. For local containers this can also be utilized and path via CPU backlog queue only needs to be taken once, not twice. On a high level this borrows from ipvlan which does similar switch in __netif_receive_skb_core() and then iterates via another_round. This helps to reduce latency for mentioned use cases. Pod to remote pod with redirect(), TCP_RR [1]: # percpu_netperf 10.217.1.33 RT_LATENCY: 122.450 (per CPU: 122.666 122.401 122.333 122.401 ) MEAN_LATENCY: 121.210 (per CPU: 121.100 121.260 121.320 121.160 ) STDDEV_LATENCY: 120.040 (per CPU: 119.420 119.910 125.460 115.370 ) MIN_LATENCY: 46.500 (per CPU: 47.000 47.000 47.000 45.000 ) P50_LATENCY: 118.500 (per CPU: 118.000 119.000 118.000 119.000 ) P90_LATENCY: 127.500 (per CPU: 127.000 128.000 127.000 128.000 ) P99_LATENCY: 130.750 (per CPU: 131.000 131.000 129.000 132.000 ) TRANSACTION_RATE: 32666.400 (per CPU: 8152.200 8169.842 8174.439 8169.897 ) Pod to remote pod with redirect_peer(), TCP_RR: # percpu_netperf 10.217.1.33 RT_LATENCY: 44.449 (per CPU: 43.767 43.127 45.279 45.622 ) MEAN_LATENCY: 45.065 (per CPU: 44.030 45.530 45.190 45.510 ) STDDEV_LATENCY: 84.823 (per CPU: 66.770 97.290 84.380 90.850 ) MIN_LATENCY: 33.500 (per CPU: 33.000 33.000 34.000 34.000 ) P50_LATENCY: 43.250 (per CPU: 43.000 43.000 43.000 44.000 ) P90_LATENCY: 46.750 (per CPU: 46.000 47.000 47.000 47.000 ) P99_LATENCY: 52.750 (per CPU: 51.000 54.000 53.000 53.000 ) TRANSACTION_RATE: 90039.500 (per CPU: 22848.186 23187.089 22085.077 21919.130 ) [0] https://linuxplumbersconf.org/event/7/contributions/674/attachments/568/1002/plumbers_2020_cilium_load_balancer.pdf [1] https://github.com/borkmann/netperf_scripts/blob/master/percpu_netperf Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201010234006.7075-3-daniel@iogearbox.net
*	net: remove napi_hash_del() from driver-facing API	Jakub Kicinski	2020-09-10	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We allow drivers to call napi_hash_del() before calling netif_napi_del() to batch RCU grace periods. This makes the API asymmetric and leaks internal implementation details. Soon we will want the grace period to protect more than just the NAPI hash table. Restructure the API and have drivers call a new function - __netif_napi_del() if they want to take care of RCU waits. Note that only core was checking the return status from napi_hash_del() so the new helper does not report if the NAPI was actually deleted. Some notes on driver oddness: - veth observed the grace period before calling netif_napi_del() but that should not matter - myri10ge observed normal RCU flavor - bnx2x and enic did not actually observe the grace period (unless they did so implicitly) - virtio_net and enic only unhashed Rx NAPIs The last two points seem to indicate that the calls to napi_hash_del() were a left over rather than an optimization. Regardless, it's easy enough to correct them. This patch may introduce extra synchronize_net() calls for interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on free_netdev() to call netif_napi_del(). This seems inevitable since we want to use RCU for netpoll dev->napi_list traversal, and almost no drivers set IFF_DISABLE_NETPOLL. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	2020-09-05	1	-4/+4
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We got slightly different patches removing a double word in a comment in net/ipv4/raw.c - picked the version from net. Simple conflict in drivers/net/ethernet/ibm/ibmvnic.c. Use cached values instead of VNIC login response buffer (following what commit 507ebe6444a4 ("ibmvnic: Fix use-after-free of VNIC login response buffer") did). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
\| *	treewide: Use fallthrough pseudo-keyword	Gustavo A. R. Silva	2020-08-24	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
* \|	net-veth: Add type safety to veth_xdp_to_ptr() and veth_ptr_to_xdp()	Maciej Żenczykowski	2020-08-19	1	-3/+3
\|/ \| \| \| \| \| \| \| \| \|	This reduces likelihood of incorrect use. Test: builds Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200819020027.4072288-1-zenczykowski@gmail.com
*	bpf, xdp: Remove XDP_QUERY_PROG and XDP_QUERY_PROG_HW XDP commands	Andrii Nakryiko	2020-07-26	1	-15/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that BPF program/link management is centralized in generic net_device code, kernel code never queries program id from drivers, so XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary. This patch removes all the implementations of those commands in kernel, along the xdp_attachment_query(). This patch was compile-tested on allyesconfig. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com
*	xdp: Rename convert_to_xdp_frame in xdp_convert_buff_to_frame	Lorenzo Bianconi	2020-06-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	In order to use standard 'xdp' prefix, rename convert_to_xdp_frame utility routine in xdp_convert_buff_to_frame and replace all the occurrences Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/6344f739be0d1a08ab2b9607584c4d5478c8c083.1590698295.git.lorenzo@kernel.org
*	xdp: Introduce xdp_convert_frame_to_buff utility routine	Lorenzo Bianconi	2020-06-02	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Introduce xdp_convert_frame_to_buff utility routine to initialize xdp_buff fields from xdp_frames ones. Rely on xdp_convert_frame_to_buff in veth xdp code. Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/87acf133073c4b2d4cbb8097e8c2480c0a0fac32.1590698295.git.lorenzo@kernel.org
*	veth: Xdp using frame_sz in veth driver	Jesper Dangaard Brouer	2020-05-15	1	-9/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The veth driver can run XDP in "native" mode in it's own NAPI handler, and since commit 9fc8d518d9d5 ("veth: Handle xdp_frames in xdp napi ring") packets can come in two forms either xdp_frame or skb, calling respectively veth_xdp_rcv_one() or veth_xdp_rcv_skb(). For packets to arrive in xdp_frame format, they will have been redirected from an XDP native driver. In case of XDP_PASS or no XDP-prog attached, the veth driver will allocate and create an SKB. The current code in veth_xdp_rcv_one() xdp_frame case, had to guess the frame truesize of the incoming xdp_frame, when using veth_build_skb(). With xdp_frame->frame_sz this is not longer necessary. Calculating the frame_sz in veth_xdp_rcv_skb() skb case, is done similar to the XDP-generic handling code in net/core/dev.c. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Link: https://lore.kernel.org/bpf/158945338840.97035.935897116345700902.stgit@firesoul
*	veth: Adjust hard_start offset on redirect XDP frames	Jesper Dangaard Brouer	2020-05-15	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When native XDP redirect into a veth device, the frame arrives in the xdp_frame structure. It is then processed in veth_xdp_rcv_one(), which can run a new XDP bpf_prog on the packet. Doing so requires converting xdp_frame to xdp_buff, but the tricky part is that xdp_frame memory area is located in the top (data_hard_start) memory area that xdp_buff will point into. The current code tried to protect the xdp_frame area, by assigning xdp_buff.data_hard_start past this memory. This results in 32 bytes less headroom to expand into via BPF-helper bpf_xdp_adjust_head(). This protect step is actually not needed, because BPF-helper bpf_xdp_adjust_head() already reserve this area, and don't allow BPF-prog to expand into it. Thus, it is safe to point data_hard_start directly at xdp_frame memory area. Fixes: 9fc8d518d9d5 ("veth: Handle xdp_frames in xdp napi ring") Reported-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158945338331.97035.5923525383710752178.stgit@firesoul
*	veth: rely on peer veth_rq for ndo_xdp_xmit accounting	Lorenzo Bianconi	2020-03-27	1	-47/+82
\| \| \| \| \| \| \| \| \| \|	Rely on 'remote' veth_rq to account ndo_xdp_xmit ethtool counters. Move XDP_TX accounting to veth_xdp_flush_bq routine. Remove 'rx' prefix in rx xdp ethool counters Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	veth: rely on veth_rq in veth_xdp_flush_bq signature	Lorenzo Bianconi	2020-03-27	1	-15/+15
\| \| \| \| \| \| \| \| \| \|	Substitute net_device point with veth_rq one in veth_xdp_flush_bq, veth_xdp_flush and veth_xdp_tx signature. This is a preliminary patch to account xdp_xmit counter on 'receiving' veth_rq Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	veth: remove atomic64_add from veth_xdp_xmit hotpath	Lorenzo Bianconi	2020-03-20	1	-13/+17
\| \| \| \| \| \| \| \|	Remove atomic64_add from veth_xdp_xmit hotpath and rely on xdp_xmit_err/xdp_tx_err counters Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	veth: introduce more xdp counters	Lorenzo Bianconi	2020-03-20	1	-7/+35
\| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce xdp_xmit counter in order to distinguish between XDP_TX and ndo_xdp_xmit stats. Introduce the following ethtool counters: - rx_xdp_tx - rx_xdp_tx_errors - tx_xdp_xmit - tx_xdp_xmit_errors - rx_xdp_redirect Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	veth: distinguish between rx_drops and xdp_drops	Lorenzo Bianconi	2020-03-20	1	-5/+7
\| \| \| \| \| \| \| \|	Distinguish between rx_drops and xdp_drops since the latter is already reported in rx_packets. Report xdp_drops in ethtool statistics Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	veth: introduce more specialized counters in veth_stats	Lorenzo Bianconi	2020-03-20	1	-32/+40
\| \| \| \| \| \| \| \| \| \| \|	Introduce xdp_tx, xdp_redirect and rx_drops counters in veth_stats data structure. Move stats accounting in veth_poll. Remove xdp_xmit variable in veth_xdp_rcv_one/veth_xdp_rcv_skb and rely on veth_stats counters. This is a preliminary patch to align veth xdp statistics to mlx, intel and marvell xdp implementation Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	veth: move xdp stats in a dedicated structure	Lorenzo Bianconi	2020-03-20	1	-13/+17
\| \| \| \| \| \| \| \|	Move xdp stats in veth_stats data structure. This is a preliminary patch to align xdp statistics to mlx5, ixgbe and mvneta drivers Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	veth: ignore peer tx_dropped when counting local rx_dropped	Jiang Lidong	2020-03-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When local NET_RX backlog is full due to traffic overrun, peer veth tx_dropped counter increases. At that time, list local veth stats, rx_dropped has double value of peer tx_dropped, even bigger than transmit packets by peer. In NET_RX softirq process, if any packet drop case happens, it increases dev's rx_dropped counter and returns NET_RX_DROP. At veth tx side, it records any error returned from peer netif_rx into local dev tx_dropped counter. In veth get stats process, it puts local dev rx_dropped and peer dev tx_dropped into together as local rx_drpped value. So that it shows double value of real dropped packets number in this case. This patch ignores peer tx_dropped when counting local rx_dropped, since peer tx_dropped is duplicated to local rx_dropped at most cases. Signed-off-by: Jiang Lidong <jianglidong3@jd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bpf, xdp: Remove no longer required rcu_read_{un}lock()	John Fastabend	2020-01-27	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we depend on rcu_call() and synchronize_rcu() to also wait for preempt_disabled region to complete the rcu read critical section in __dev_map_flush() is no longer required. Except in a few special cases in drivers that need it for other reasons. These originally ensured the map reference was safe while a map was also being free'd. And additionally that bpf program updates via ndo_bpf did not happen while flush updates were in flight. But flush by new rules can only be called from preempt-disabled NAPI context. The synchronize_rcu from the map free path and the rcu_call from the delete path will ensure the reference there is safe. So lets remove the rcu_read_lock and rcu_read_unlock pair to avoid any confusion around how this is being protected. If the rcu_read_lock was required it would mean errors in the above logic and the original patch would also be wrong. Now that we have done above we put the rcu_read_lock in the driver code where it is needed in a driver dependent way. I think this helps readability of the code so we know where and why we are taking read locks. Most drivers will not need rcu_read_locks here and further XDP drivers already have rcu_read_locks in their code paths for reading xdp programs on RX side so this makes it symmetric where we don't have half of rcu critical sections define in driver and the other half in devmap. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/1580084042-11598-4-git-send-email-john.fastabend@gmail.com
*	xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths	Toke Høiland-Jørgensen	2020-01-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the bulk queue used by XDP_REDIRECT now lives in struct net_device, we can re-use the bulking for the non-map version of the bpf_redirect() helper. This is a simple matter of having xdp_do_redirect_slow() queue the frame on the bulk queue instead of sending it out with __bpf_tx_xdp(). Unfortunately we can't make the bpf_redirect() helper return an error if the ifindex doesn't exit (as bpf_redirect_map() does), because we don't have a reference to the network namespace of the ingress device at the time the helper is called. So we have to leave it as-is and keep the device lookup in xdp_do_redirect_slow(). Since this leaves less reason to have the non-map redirect code in a separate function, so we get rid of the xdp_do_redirect_slow() function entirely. This does lose us the tracepoint disambiguation, but fortunately the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint entry structures. This means both can contain a map index, so we can just amend the tracepoint definitions so we always emit the xdp_redirect(_err) tracepoints, but with the map ID only populated if a map is present. This means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep the definitions around in case someone is still listening for them. With this change, the performance of the xdp_redirect sample program goes from 5Mpps to 8.4Mpps (a 68% increase). Since the flush functions are no longer map-specific, rename the flush() functions to drop _map from their names. One of the renamed functions is the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To keep from having to update all drivers, use a #define to keep the old name working, and only update the virtual drivers in this patch. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk
*	veth: use standard dev_lstats_add() and dev_lstats_read()	Eric Dumazet	2019-11-08	1	-32/+11
\| \| \| \| \| \| \|	This cleanup will ease u64_stats_t adoption in a single location. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	veth: Support bulk XDP_TX	Toshiaki Makita	2019-06-25	1	-12/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	XDP_TX is similar to XDP_REDIRECT as it essentially redirects packets to the device itself. XDP_REDIRECT has bulk transmit mechanism to avoid the heavy cost of indirect call but it also reduces lock acquisition on the destination device that needs locks like veth and tun. XDP_TX does not use indirect calls but drivers which require locks can benefit from the bulk transmit for XDP_TX as well. This patch introduces bulk transmit mechanism in veth using bulk queue on stack, and improves XDP_TX performance by about 9%. Here are single-core/single-flow XDP_TX test results. CPU consumptions are taken from "perf report --no-child". - Before: 7.26 Mpps _raw_spin_lock 7.83% veth_xdp_xmit 12.23% - After: 7.94 Mpps _raw_spin_lock 1.08% veth_xdp_xmit 6.10% v2: - Use stack for bulk queue instead of a global variable. Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	veth: use xdp_release_frame for XDP_PASS	Jesper Dangaard Brouer	2019-06-19	1	-0/+1
\| \| \| \| \| \| \| \|	Like cpumap use xdp_release_frame() when an xdp_frame got converted into an SKB and send towars the network stack. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	treewide: Add SPDX license identifier for more missed files	Thomas Gleixner	2019-05-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
*	net: veth: use generic helper to report timestamping info	Julian Wiedmann	2019-04-13	1	-13/+1
\| \| \| \| \| \| \| \|	For reporting the common set of SW timestamping capabilities, use ethtool_op_get_ts_info() instead of re-implementing it. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	veth: Fix -Wformat-truncation	Florian Fainelli	2019-02-23	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Provide a precision hint to snprintf() in order to eliminate a -Wformat-truncation warning provided below. A maximum of 11 characters is allowed to reach a maximum of 32 - 1 characters given a possible maximum value of queues using up to UINT_MAX which occupies 10 characters. Incidentally 11 is the number of characters for "xdp_packets" which is the largest string we append. drivers/net/veth.c: In function 'veth_get_strings': drivers/net/veth.c:118:47: warning: '%s' directive output may be truncated writing up to 31 bytes into a region of size between 12 and 21 [-Wformat-truncation=] snprintf(p, ETH_GSTRING_LEN, "rx_queue_%u_%s", ^~ drivers/net/veth.c:118:5: note: 'snprintf' output between 12 and 52 bytes into a destination of size 32 snprintf(p, ETH_GSTRING_LEN, "rx_queue_%u_%s", ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ i, veth_rq_stats_desc[j].desc); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	veth: Mark expected switch fall-throughs	Gustavo A. R. Silva	2019-02-08	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enabling -Wimplicit-fallthrough. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Add extack argument to rtnl_create_link	David Ahern	2018-11-07	1	-1/+1
\| \| \| \| \| \| \| \|	Add extack arg to rtnl_create_link and add messages for invalid number of Tx or Rx queues. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	veth: Add ethtool statistics support for XDP	Toshiaki Makita	2018-10-16	1	-2/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Expose per-queue stats for ethtool -S. As there are only rx queues, and rx queues are used only when XDP is used, per-queue counters are only rx XDP ones. Example: $ ethtool -S veth0 NIC statistics: peer_ifindex: 11 rx_queue_0_xdp_packets: 28601434 rx_queue_0_xdp_bytes: 1716086040 rx_queue_0_xdp_drops: 28601434 rx_queue_1_xdp_packets: 17873050 rx_queue_1_xdp_bytes: 1072383000 rx_queue_1_xdp_drops: 17873050 Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
*	veth: Account for XDP packet statistics on rx side	Toshiaki Makita	2018-10-16	1	-18/+79
\| \| \| \| \| \| \| \| \| \| \| \|	On XDP path veth has napi handler so we can collect statistics on per-queue basis for XDP. By this change now we can collect XDP_DROP drop count as well as packets and bytes coming through ndo_xdp_xmit. Packet counters shown by "ip -s link", sysfs stats or /proc/net/dev is now correct for XDP. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
*	veth: Account for packet drops in ndo_xdp_xmit	Toshiaki Makita	2018-10-16	1	-8/+22
\| \| \| \| \| \| \| \| \| \| \|	Use existing atomic drop counter. Since drop path is really an exceptional case here, I'm thinking atomic ops would not hurt the performance. XDP packets and bytes are not counted in ndo_xdp_xmit, but will be accounted on rx side by the following commit. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
*	veth: rename pcpu_vstats as pcpu_lstats	Li RongQing	2018-09-19	1	-14/+8
\| \| \| \| \| \| \| \| \| \|	struct pcpu_vstats and pcpu_lstats have same members and usage, and pcpu_lstats is used in many files, so rename pcpu_vstats as pcpu_lstats to reduce duplicate definition Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net	David S. Miller	2018-09-18	1	-2/+2
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two new tls tests added in parallel in both net and net-next. Used Stephen Rothwell's linux-next resolution. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	veth: Orphan skb before GRO	Toshiaki Makita	2018-09-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GRO expects skbs not to be owned by sockets, but when XDP is enabled veth passed skbs owned by sockets. It caused corrupted sk_wmem_alloc. Paolo Abeni reported the following splat: [ 362.098904] refcount_t overflow at skb_set_owner_w+0x5e/0xa0 in iperf3[1644], uid/euid: 0/0 [ 362.108239] WARNING: CPU: 0 PID: 1644 at kernel/panic.c:648 refcount_error_report+0xa0/0xa4 [ 362.117547] Modules linked in: tcp_diag inet_diag veth intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf ipmi_ssif iTCO_wdt sg ipmi_si iTCO_vendor_support ipmi_devintf mxm_wmi ipmi_msghandler pcspkr dcdbas mei_me wmi mei lpc_ich acpi_power_meter pcc_cpufreq xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ixgbe igb ttm ahci mdio libahci ptp crc32c_intel drm pps_core libata i2c_algo_bit dca dm_mirror dm_region_hash dm_log dm_mod [ 362.176622] CPU: 0 PID: 1644 Comm: iperf3 Not tainted 4.19.0-rc2.vanilla+ #2025 [ 362.184777] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016 [ 362.193124] RIP: 0010:refcount_error_report+0xa0/0xa4 [ 362.198758] Code: 08 00 00 48 8b 95 80 00 00 00 49 8d 8c 24 80 0a 00 00 41 89 c1 44 89 2c 24 48 89 de 48 c7 c7 18 4d e7 9d 31 c0 e8 30 fa ff ff <0f> 0b eb 88 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 49 89 fc [ 362.219711] RSP: 0018:ffff9ee6ff603c20 EFLAGS: 00010282 [ 362.225538] RAX: 0000000000000000 RBX: ffffffff9de83e10 RCX: 0000000000000000 [ 362.233497] RDX: 0000000000000001 RSI: ffff9ee6ff6167d8 RDI: ffff9ee6ff6167d8 [ 362.241457] RBP: ffff9ee6ff603d78 R08: 0000000000000490 R09: 0000000000000004 [ 362.249416] R10: 0000000000000000 R11: ffff9ee6ff603990 R12: ffff9ee664b94500 [ 362.257377] R13: 0000000000000000 R14: 0000000000000004 R15: ffffffff9de615f9 [ 362.265337] FS: 00007f1d22d28740(0000) GS:ffff9ee6ff600000(0000) knlGS:0000000000000000 [ 362.274363] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 362.280773] CR2: 00007f1d222f35d0 CR3: 0000001fddfec003 CR4: 00000000001606f0 [ 362.288733] Call Trace: [ 362.291459] <IRQ> [ 362.293702] ex_handler_refcount+0x4e/0x80 [ 362.298269] fixup_exception+0x35/0x40 [ 362.302451] do_trap+0x109/0x150 [ 362.306048] do_error_trap+0xd5/0x130 [ 362.315766] invalid_op+0x14/0x20 [ 362.319460] RIP: 0010:skb_set_owner_w+0x5e/0xa0 [ 362.324512] Code: ef ff ff 74 49 48 c7 43 60 20 7b 4a 9d 8b 85 f4 01 00 00 85 c0 75 16 8b 83 e0 00 00 00 f0 01 85 44 01 00 00 0f 88 d8 23 16 00 <5b> 5d c3 80 8b 91 00 00 00 01 8b 85 f4 01 00 00 89 83 a4 00 00 00 [ 362.345465] RSP: 0018:ffff9ee6ff603e20 EFLAGS: 00010a86 [ 362.351291] RAX: 0000000000001100 RBX: ffff9ee65deec700 RCX: ffff9ee65e829244 [ 362.359250] RDX: 0000000000000100 RSI: ffff9ee65e829100 RDI: ffff9ee65deec700 [ 362.367210] RBP: ffff9ee65e829100 R08: 000000000002a380 R09: 0000000000000000 [ 362.375169] R10: 0000000000000002 R11: fffff1a4bf77bb00 R12: ffffc0754661d000 [ 362.383130] R13: ffff9ee65deec200 R14: ffff9ee65f597000 R15: 00000000000000aa [ 362.391092] veth_xdp_rcv+0x4e4/0x890 [veth] [ 362.399357] veth_poll+0x4d/0x17a [veth] [ 362.403731] net_rx_action+0x2af/0x3f0 [ 362.407912] __do_softirq+0xdd/0x29e [ 362.411897] do_softirq_own_stack+0x2a/0x40 [ 362.416561] </IRQ> [ 362.418899] do_softirq+0x4b/0x70 [ 362.422594] __local_bh_enable_ip+0x50/0x60 [ 362.427258] ip_finish_output2+0x16a/0x390 [ 362.431824] ip_output+0x71/0xe0 [ 362.440670] __tcp_transmit_skb+0x583/0xab0 [ 362.445333] tcp_write_xmit+0x247/0xfb0 [ 362.449609] __tcp_push_pending_frames+0x2d/0xd0 [ 362.454760] tcp_sendmsg_locked+0x857/0xd30 [ 362.459424] tcp_sendmsg+0x27/0x40 [ 362.463216] sock_sendmsg+0x36/0x50 [ 362.467104] sock_write_iter+0x87/0x100 [ 362.471382] __vfs_write+0x112/0x1a0 [ 362.475369] vfs_write+0xad/0x1a0 [ 362.479062] ksys_write+0x52/0xc0 [ 362.482759] do_syscall_64+0x5b/0x180 [ 362.486841] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 362.492473] RIP: 0033:0x7f1d22293238 [ 362.496458] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 c5 54 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55 [ 362.517409] RSP: 002b:00007ffebaef8008 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 362.525855] RAX: ffffffffffffffda RBX: 0000000000002800 RCX: 00007f1d22293238 [ 362.533816] RDX: 0000000000002800 RSI: 00007f1d22d36000 RDI: 0000000000000005 [ 362.541775] RBP: 00007f1d22d36000 R08: 00000002db777a30 R09: 0000562b70712b20 [ 362.549734] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005 [ 362.557693] R13: 0000000000002800 R14: 00007ffebaef8060 R15: 0000562b70712260 In order to avoid this, orphan the skb before entering GRO. Fixes: 948d4f214fde ("veth: Add driver XDP") Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Tested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	veth: add software timestamping	Michael Walle	2018-09-01	1	-0/+15
\|/ \| \| \| \| \| \| \| \| \| \|	Provide a software TX timestamp as well as the ethtool query interface and report the software timestamp capabilities. Tested with "ethtool -T" and two linuxptp instances each bound to a tunnel endpoint. Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
*	veth: Free queues on link delete	Toshiaki Makita	2018-08-16	1	-37/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	David Ahern reported memory leak in veth. ======================================================================= $ cat /sys/kernel/debug/kmemleak unreferenced object 0xffff8800354d5c00 (size 1024): comm "ip", pid 836, jiffies 4294722952 (age 25.904s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<(____ptrval____)>] kmemleak_alloc+0x70/0x94 [<(____ptrval____)>] slab_post_alloc_hook+0x42/0x52 [<(____ptrval____)>] __kmalloc+0x101/0x142 [<(____ptrval____)>] kmalloc_array.constprop.20+0x1e/0x26 [veth] [<(____ptrval____)>] veth_newlink+0x147/0x3ac [veth] ... unreferenced object 0xffff88002e009c00 (size 1024): comm "ip", pid 836, jiffies 4294722958 (age 25.898s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<(____ptrval____)>] kmemleak_alloc+0x70/0x94 [<(____ptrval____)>] slab_post_alloc_hook+0x42/0x52 [<(____ptrval____)>] __kmalloc+0x101/0x142 [<(____ptrval____)>] kmalloc_array.constprop.20+0x1e/0x26 [veth] [<(____ptrval____)>] veth_newlink+0x219/0x3ac [veth] ======================================================================= veth_rq allocated in veth_newlink() was not freed on dellink. We need to free up them after veth_close() so that any packets will not reference the queues afterwards. Thus free them in veth_dev_free() in the same way as freeing stats structure (vstats). Also move queues allocation to veth_dev_init() to be in line with stats allocation. Fixes: 638264dc90227 ("veth: Support per queue XDP ring") Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	veth: Support per queue XDP ring	Toshiaki Makita	2018-08-10	1	-90/+188
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move XDP and napi related fields from veth_priv to newly created veth_rq structure. When xdp_frames are enqueued from ndo_xdp_xmit and XDP_TX, rxq is selected by current cpu. When skbs are enqueued from the peer device, rxq is one to one mapping of its peer txq. This way we have a restriction that the number of rxqs must not less than the number of peer txqs, but leave the possibility to achieve bulk skb xmit in the future because txq lock would make it possible to remove rxq ptr_ring lock. v3: - Add extack messages. - Fix array overrun in veth_xmit. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	veth: Add XDP TX and REDIRECT	Toshiaki Makita	2018-08-10	1	-9/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows further redirection of xdp_frames like NIC -> veth--veth -> veth--veth (XDP) (XDP) (XDP) The intermediate XDP, redirecting packets from NIC to the other veth, reuses xdp_mem_info from NIC so that page recycling of the NIC works on the destination veth's XDP. In this way return_frame is not fully guarded by NAPI, since another NAPI handler on another cpu may use the same xdp_mem_info concurrently. Thus disable napi_direct by xdp_set_return_frame_no_direct() during the NAPI context. v8: - Don't use xdp_frame pointer address for data_hard_start of xdp_buff. v4: - Use xdp_[set\|clear]_return_frame_no_direct() instead of a flag in xdp_mem_info. v3: - Fix double free when veth_xdp_tx() returns a positive value. - Convert xdp_xmit and xdp_redir variables into flags. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	veth: Add ndo_xdp_xmit	Toshiaki Makita	2018-08-10	1	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows NIC's XDP to redirect packets to veth. The destination veth device enqueues redirected packets to the napi ring of its peer, then they are processed by XDP on its peer veth device. This can be thought as calling another XDP program by XDP program using REDIRECT, when the peer enables driver XDP. Note that when the peer veth device does not set driver xdp, redirected packets will be dropped because the peer is not ready for NAPI. v4: - Don't use xdp_ok_fwd_dev() because checking IFF_UP is not necessary. Add comments about it and check only MTU. v2: - Drop the part converting xdp_frame into skb when XDP is not enabled. - Implement bulk interface of ndo_xdp_xmit. - Implement XDP_XMIT_FLUSH bit and drop ndo_xdp_flush. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	veth: Handle xdp_frames in xdp napi ring	Toshiaki Makita	2018-08-10	1	-5/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is preparation for XDP TX and ndo_xdp_xmit. This allows napi handler to handle xdp_frames through xdp ring as well as sk_buff. v8: - Don't use xdp_frame pointer address to calculate skb->head and headroom. v7: - Use xdp_scrub_frame() instead of memset(). v3: - Revert v2 change around rings and use a flag to differentiate skb and xdp_frame, since bulk skb xmit makes little performance difference for now. v2: - Use another ring instead of using flag to differentiate skb and xdp_frame. This approach makes bulk skb transmit possible in veth_xmit later. - Clear xdp_frame feilds in skb->head. - Implement adjust_tail. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	veth: Avoid drops by oversized packets when XDP is enabled	Toshiaki Makita	2018-08-10	1	-2/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Oversized packets including GSO packets can be dropped if XDP is enabled on receiver side, so don't send such packets from peer. Drop TSO and SCTP fragmentation features so that veth devices themselves segment packets with XDP enabled. Also cap MTU accordingly. v4: - Don't auto-adjust MTU but cap max MTU. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	veth: Add driver XDP	Toshiaki Makita	2018-08-10	1	-7/+367
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the basic implementation of veth driver XDP. Incoming packets are sent from the peer veth device in the form of skb, so this is generally doing the same thing as generic XDP. This itself is not so useful, but a starting point to implement other useful veth XDP features like TX and REDIRECT. This introduces NAPI when XDP is enabled, because XDP is now heavily relies on NAPI context. Use ptr_ring to emulate NIC ring. Tx function enqueues packets to the ring and peer NAPI handler drains the ring. Currently only one ring is allocated for each veth device, so it does not scale on multiqueue env. This can be resolved by allocating rings on the per-queue basis later. Note that NAPI is not used but netif_rx is used when XDP is not loaded, so this does not change the default behaviour. v6: - Check skb->len only when allocation is needed. - Add __GFP_NOWARN to alloc_page() as it can be triggered by external events. v3: - Fix race on closing the device. - Add extack messages in ndo_bpf. v2: - Squashed with the patch adding NAPI. - Implement adjust_tail. - Don't acquire consumer lock because it is guarded by NAPI. - Make poll_controller noop since it is unnecessary. - Register rxq_info on enabling XDP rather than on opening the device. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
*	veth: set peer GSO values	Stephen Hemminger	2017-12-08	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When new veth is created, and GSO values have been configured on one device, clone those values to the peer. For example: # ip link add dev vm1 gso_max_size 65530 type veth peer name vm2 This should create vm1 <--> vm2 with both having GSO maximum size set to 65530. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2017-06-30	1	-2/+2
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	A set of overlapping changes in macvlan and the rocker driver, nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	veth: Be more robust on network device creation when no attributes	Serhey Popovych	2017-06-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are number of problems with configuration peer network device in absence of IFLA_VETH_PEER attributes where attributes for main network device shared with peer. First it is not feasible to configure both network devices with same MAC address since this makes communication in such configuration problematic. This case can be reproduced with following sequence: # ip link add address 02:11:22:33:44:55 type veth # ip li sh ... 26: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc \ noop state DOWN mode DEFAULT qlen 1000 link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff 27: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc \ noop state DOWN mode DEFAULT qlen 1000 link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff Second it is not possible to register both main and peer network devices with same name, that happens when name for main interface is given with IFLA_IFNAME and same attribute reused for peer. This case can be reproduced with following sequence: # ip link add dev veth1a type veth RTNETLINK answers: File exists To fix both of the cases check if corresponding netlink attributes are taken from peer_tb when valid or name based on rtnl ops kind and random address is used. Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: add netlink_ext_ack argument to rtnl_link_ops.validate	Matthias Schiffer	2017-06-27	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: add netlink_ext_ack argument to rtnl_link_ops.newlink	Matthias Schiffer	2017-06-27	1	-1/+2
\|/ \| \| \| \| \| \| \|	Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Fix inconsistent teardown and release of private netdev state.	David S. Miller	2017-06-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Network devices can allocate reasources and private memory using netdev_ops->ndo_init(). However, the release of these resources can occur in one of two different places. Either netdev_ops->ndo_uninit() or netdev->destructor(). The decision of which operation frees the resources depends upon whether it is necessary for all netdev refs to be released before it is safe to perform the freeing. netdev_ops->ndo_uninit() presumably can occur right after the NETDEV_UNREGISTER notifier completes and the unicast and multicast address lists are flushed. netdev->destructor(), on the other hand, does not run until the netdev references all go away. Further complicating the situation is that netdev->destructor() almost universally does also a free_netdev(). This creates a problem for the logic in register_netdevice(). Because all callers of register_netdevice() manage the freeing of the netdev, and invoke free_netdev(dev) if register_netdevice() fails. If netdev_ops->ndo_init() succeeds, but something else fails inside of register_netdevice(), it does call ndo_ops->ndo_uninit(). But it is not able to invoke netdev->destructor(). This is because netdev->destructor() will do a free_netdev() and then the caller of register_netdevice() will do the same. However, this means that the resources that would normally be released by netdev->destructor() will not be. Over the years drivers have added local hacks to deal with this, by invoking their destructor parts by hand when register_netdevice() fails. Many drivers do not try to deal with this, and instead we have leaks. Let's close this hole by formalizing the distinction between what private things need to be freed up by netdev->destructor() and whether the driver needs unregister_netdevice() to perform the free_netdev(). netdev->priv_destructor() performs all actions to free up the private resources that used to be freed by netdev->destructor(), except for free_netdev(). netdev->needs_free_netdev is a boolean that indicates whether free_netdev() should be done at the end of unregister_netdevice(). Now, register_netdevice() can sanely release all resources after ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit() and netdev->priv_destructor(). And at the end of unregister_netdevice(), we invoke netdev->priv_destructor() and optionally call free_netdev(). Signed-off-by: David S. Miller <davem@davemloft.net>
*	netlink: pass extended ACK struct to parsing functions	Johannes Berg	2017-04-13	1	-1/+2
\| \| \| \| \| \| \| \| \|	Pass the new extended ACK reporting struct to all of the generic netlink parsing functions. For now, pass NULL in almost all callers (except for some in the core.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: veth: use new api ethtool_{get\|set}_link_ksettings	Philippe Reynes	2017-03-29	1	-12/+7
\| \| \| \| \| \| \| \| \|	The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>