linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	bpf: Revert bpf_overrid_function() helper changes.	David S. Miller	2017-11-11	21	-204/+11
\| \| \| \| \| \|	NACK'd by x86 maintainer. Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'bpf-Fix-bugs-in-sock_ops-samples'	David S. Miller	2017-11-11	6	-29/+41
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Lawrence Brakmo says: ==================== bpf: Fix bugs in sock_ops samples The programs were returning -1 in some cases when they should only return 0 or 1. Changes in the verifier now catch this issue and the programs fail to load. This is now fixed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	bpf: Fix tcp_clamp_kern.c sample program	Lawrence Brakmo	2017-11-11	1	-11/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The program was returning -1 in some cases which is not allowed by the verifier any longer. Fixes: 390ee7e29fc8 ("bpf: enforce return code for cgroup-bpf programs") Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	bpf: Fix tcp_iw_kern.c sample program	Lawrence Brakmo	2017-11-11	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The program was returning -1 in some cases which is not allowed by the verifier any longer. Fixes: 390ee7e29fc8 ("bpf: enforce return code for cgroup-bpf programs") Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	bpf: Fix tcp_cong_kern.c sample program	Lawrence Brakmo	2017-11-11	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The program was returning -1 in some cases which is not allowed by the verifier any longer. Fixes: 390ee7e29fc8 ("bpf: enforce return code for cgroup-bpf programs") Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	bpf: Fix tcp_bufs_kern.c sample program	Lawrence Brakmo	2017-11-11	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The program was returning -1 in some cases which is not allowed by the verifier any longer. Fixes: 390ee7e29fc8 ("bpf: enforce return code for cgroup-bpf programs") Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	bpf: Fix tcp_rwnd_kern.c sample program	Lawrence Brakmo	2017-11-11	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The program was returning -1 in some cases which is not allowed by the verifier any longer. Fixes: 390ee7e29fc8 ("bpf: enforce return code for cgroup-bpf programs") Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	bpf: Fix tcp_synrto_kern.c sample program	Lawrence Brakmo	2017-11-11	1	-2/+4
\|/ \| \| \| \| \| \| \| \|	The program was returning -1 in some cases which is not allowed by the verifier any longer. Fixes: 390ee7e29fc8 ("bpf: enforce return code for cgroup-bpf programs") Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tipc: improve link resiliency when rps is activated	Jon Maloy	2017-11-11	5	-32/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the TIPC RPS dissector is based only on the incoming packets' source node address, hence steering all traffic from a node to the same core. We have seen that this makes the links vulnerable to starvation and unnecessary resets when we turn down the link tolerance to very low values. To reduce the risk of this happening, we exempt probe and probe replies packets from the convergence to one core per source node. Instead, we do the opposite, - we try to diverge those packets across as many cores as possible, by randomizing the flow selector key. To make such packets identifiable to the dissector, we add a new 'is_keepalive' bit to word 0 of the LINK_PROTOCOL header. This bit is set both for PROBE and PROBE_REPLY messages, and only for those. It should be noted that these packets are not part of any flow anyway, and only constitute a minuscule fraction of all packets sent across a link. Hence, there is no risk that this will affect overall performance. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'macb-next'	David S. Miller	2017-11-11	1	-0/+9
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Michael Grzeschik says: ==================== net: macb: add error handling on probe and This series adds more error handling to the macb driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: macb: add of_node_put to error paths	Michael Grzeschik	2017-11-11	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We add the call of_node_put(bp->phy_node) to all associated error paths for memory clean up. Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: macb: add of_phy_deregister_fixed_link to error paths	Michael Grzeschik	2017-11-11	1	-0/+7
\|/ \| \| \| \| \| \| \| \|	We add the call of_phy_deregister_fixed_link to all associated error paths for memory clean up. Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: mvpp2: fix GOP statistics loop start and stop conditions	Miquel Raynal	2017-11-11	1	-32/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GOP statistics from all ports of one instance of the driver are gathered with one work recalled in loop in a workqueue. The loop is started when a port is up, and stopped when a port is down. This last condition is obviously wrong. Fix this by having a work per port. This way, starting and stoping it when the port is up or down will be fine, while minimizing unnecessary CPU usage. Fixes: 118d6298f6f0 ("net: mvpp2: add ethtool GOP statistics") Reported-by: Stefan Chulski <stefanc@marvell.com> Signed-off-by: Miquel Raynal <miquel.raynal@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'hns3-bug-fixes'	David S. Miller	2017-11-11	2	-13/+2
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Lipeng says: ==================== net: hns3: Bug fixes & Code improvements in HNS3 driver This patch-set introduces some bug fixes and code improvements. As [patch 1/2] depends on the patch {5392902 net: hns3: Consistently using GENMASK in hns3 driver}, which exists in net-next, not exists in net, so push this serise to nex-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: hns3: cleanup mac auto-negotiation state query in hclge_update_speed_duplex	Fuyun Liang	2017-11-11	1	-12/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When checking whether auto-negotiation is on, driver only needs to check the value of mac.autoneg(SW) directly, and does not need to query it from hardware. Because this value is always synchronized with the auto-negotiation state of hardware. This patch removes mac auto-negotiation state query in hclge_update_speed_duplex(). Fixes: 46a3df9f9718 (net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support) Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Lipeng <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: hns3: fix a bug when getting phy address from NCL_config file	Fuyun Liang	2017-11-11	1	-1/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Driver gets phy address from NCL_config file and uses the phy address to initialize phydev. There are 5 bits for phy address. And C22 phy address has 5 bits. So 0-31 are all valid address for phy. If there is no phy, it will crash. Because driver always get a valid phy address. This patch fixes the phy address to 8 bits, and use 0xff to indicate invalid phy address. Fixes: 46a3df9f9718 (net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support) Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Lipeng <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: netlink: Update attr validation to require exact length for some types	David Ahern	2017-11-11	1	-3/+16
\| \| \| \| \| \| \| \| \| \|	Attributes using NLA_U* and NLA_S* (where * is 8, 16,32 and 64) are expected to be an exact length. Split these data types from nla_attr_minlen into nla_attr_len and update validate_nla to require the attribute to have exact length for them. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: ipv6: sysctl to specify IPv6 ND traffic class	Maciej Żenczykowski	2017-11-11	5	-1/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a per-device sysctl to specify the default traffic class to use for kernel originated IPv6 Neighbour Discovery packets. Currently this includes: - Router Solicitation (ICMPv6 type 133) ndisc_send_rs() -> ndisc_send_skb() -> ip6_nd_hdr() - Neighbour Solicitation (ICMPv6 type 135) ndisc_send_ns() -> ndisc_send_skb() -> ip6_nd_hdr() - Neighbour Advertisement (ICMPv6 type 136) ndisc_send_na() -> ndisc_send_skb() -> ip6_nd_hdr() - Redirect (ICMPv6 type 137) ndisc_send_redirect() -> ndisc_send_skb() -> ip6_nd_hdr() and if the kernel ever gets around to generating RA's, it would presumably also include: - Router Advertisement (ICMPv6 type 134) (radvd daemon could pick up on the kernel setting and use it) Interface drivers may examine the Traffic Class value and translate the DiffServ Code Point into a link-layer appropriate traffic prioritization scheme. An example of mapping IETF DSCP values to IEEE 802.11 User Priority values can be found here: https://tools.ietf.org/html/draft-ietf-tsvwg-ieee-802-11 The expected primary use case is to properly prioritize ND over wifi. Testing: jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass 0 jzem22:~# echo -1 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass -bash: echo: write error: Invalid argument jzem22:~# echo 256 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass -bash: echo: write error: Invalid argument jzem22:~# echo 0 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass jzem22:~# echo 255 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass 255 jzem22:~# echo 34 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass 34 jzem22:~# echo $[0xDC] > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass jzem22:~# tcpdump -v -i eth0 icmp6 and src host jzem22.pgc and dst host fe80::1 tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes IP6 (class 0xdc, hlim 255, next-header ICMPv6 (58) payload length: 24) jzem22.pgc > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement, length 24, tgt is jzem22.pgc, Flags [solicited] (based on original change written by Erik Kline, with minor changes) v2: fix 'suspicious rcu_dereference_check() usage' by explicitly grabbing the rcu_read_lock. Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Erik Kline <ek@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/ncsi: Don't return error on normal response	Samuel Mendoza-Jonas	2017-11-11	1	-17/+14
\| \| \| \| \| \| \| \| \| \| \|	Several response handlers return EBUSY if the data corresponding to the command/response pair is already set. There is no reason to return an error here; the channel is advertising something as enabled because we told it to enable it, and it's possible that the feature has been enabled previously. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/ncsi: Improve general state logging	Samuel Mendoza-Jonas	2017-11-11	3	-21/+80
\| \| \| \| \| \| \| \| \| \|	The NCSI driver is mostly silent which becomes a headache when trying to determine what has occurred on the NCSI connection. This adds additional logging in a few key areas such as state transitions and calling out certain errors more visibly. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'bpftool-show-filenames-of-pinned-objects'	David S. Miller	2017-11-11	7	-6/+187
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prashant Bhole says: ==================== tools: bpftool: show filenames of pinned objects This patchset adds support to show pinned objects in object details. Patch1 adds a funtionality to open a path in bpf-fs regardless of its object type. Patch2 adds actual functionality by scanning the bpf-fs once and adding object information in hash table, with object id as a key. One object may be associated with multiple paths because an object can be pinned multiple times Patch3 adds command line option to enable this functionality. Making it optional because scanning bpf-fs can be costly. ==================== Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
\| *	tools: bpftool: optionally show filenames of pinned objects	Prashant Bhole	2017-11-11	6	-8/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Making it optional to show file names of pinned objects because it scans complete bpf-fs filesystem which is costly. Added option -f\|--bpffs. Documentation updated. Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tools: bpftool: show filenames of pinned objects	Prashant Bhole	2017-11-11	5	-0/+152
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added support to show filenames of pinned objects. For example: root@test# ./bpftool prog 3: tracepoint name tracepoint__irq tag f677a7dd722299a3 loaded_at Oct 26/11:39 uid 0 xlated 160B not jited memlock 4096B map_ids 4 pinned /sys/fs/bpf/softirq_prog 4: tracepoint name tracepoint__irq tag ea5dc530d00b92b6 loaded_at Oct 26/11:39 uid 0 xlated 392B not jited memlock 4096B map_ids 4,6 root@test# ./bpftool --json --pretty prog [{ "id": 3, "type": "tracepoint", "name": "tracepoint__irq", "tag": "f677a7dd722299a3", "loaded_at": "Oct 26/11:39", "uid": 0, "bytes_xlated": 160, "jited": false, "bytes_memlock": 4096, "map_ids": [4 ], "pinned": ["/sys/fs/bpf/softirq_prog" ] },{ "id": 4, "type": "tracepoint", "name": "tracepoint__irq", "tag": "ea5dc530d00b92b6", "loaded_at": "Oct 26/11:39", "uid": 0, "bytes_xlated": 392, "jited": false, "bytes_memlock": 4096, "map_ids": [4,6 ], "pinned": [] } ] root@test# ./bpftool map 4: hash name start flags 0x0 key 4B value 16B max_entries 10240 memlock 1003520B pinned /sys/fs/bpf/softirq_map1 5: hash name iptr flags 0x0 key 4B value 8B max_entries 10240 memlock 921600B root@test# ./bpftool --json --pretty map [{ "id": 4, "type": "hash", "name": "start", "flags": 0, "bytes_key": 4, "bytes_value": 16, "max_entries": 10240, "bytes_memlock": 1003520, "pinned": ["/sys/fs/bpf/softirq_map1" ] },{ "id": 5, "type": "hash", "name": "iptr", "flags": 0, "bytes_key": 4, "bytes_value": 8, "max_entries": 10240, "bytes_memlock": 921600, "pinned": [] } ] Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tools: bpftool: open pinned object without type check	Prashant Bhole	2017-11-11	2	-2/+14
\|/ \| \| \| \| \| \| \|	This was needed for opening any file in bpf-fs without knowing its object type Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'BPF-directed-error-injection'	David S. Miller	2017-11-11	21	-11/+204
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Josef Bacik says: ==================== Add the ability to do BPF directed error injection I'm sending this through Dave since it'll conflict with other BPF changes in his tree, but since it touches tracing as well Dave would like a review from somebody on the tracing side. v4->v5: - disallow kprobe_override programs from being put in the prog map array so we don't tail call into something we didn't check. This allows us to make the normal path still fast without a bunch of percpu operations. v3->v4: - fix a build error found by kbuild test bot (I didn't wait long enough apparently.) - Added a warning message as per Daniels suggestion. v2->v3: - added a ->kprobe_override flag to bpf_prog. - added some sanity checks to disallow attaching bpf progs that have ->kprobe_override set that aren't for ftrace kprobes. - added the trace_kprobe_ftrace helper to check if the trace_event_call is a ftrace kprobe. - renamed bpf_kprobe_state to bpf_kprobe_override, fixed it so we only read this value in the kprobe path, and thus only write to it if we're overriding or clearing the override. v1->v2: - moved things around to make sure that bpf_override_return could really only be used for an ftrace kprobe. - killed the special return values from trace_call_bpf. - renamed pc_modified to bpf_kprobe_state so bpf_override_return could tell if it was being called from an ftrace kprobe context. - reworked the logic in kprobe_perf_func to take advantage of bpf_kprobe_state. - updated the test as per Alexei's review. - Original message - A lot of our error paths are not well tested because we have no good way of injecting errors generically. Some subystems (block, memory) have ways to inject errors, but they are random so it's hard to get reproduceable results. With BPF we can add determinism to our error injection. We can use kprobes and other things to verify we are injecting errors at the exact case we are trying to test. This patch gives us the tool to actual do the error injection part. It is very simple, we just set the return value of the pt_regs we're given to whatever we provide, and then override the PC with a dummy function that simply returns. Right now this only works on x86, but it would be simple enough to expand to other architectures. Thanks, ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	samples/bpf: add a test for bpf_override_return	Josef Bacik	2017-11-11	6	-2/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a basic test for bpf_override_return to verify it works. We override the main function for mounting a btrfs fs so it'll return -ENOMEM and then make sure that trying to mount a btrfs fs will fail. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	bpf: add a bpf_override_function helper	Josef Bacik	2017-11-11	15	-9/+133
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Error injection is sloppy and very ad-hoc. BPF could fill this niche perfectly with it's kprobe functionality. We could make sure errors are only triggered in specific call chains that we care about with very specific situations. Accomplish this with the bpf_override_funciton helper. This will modify the probe'd callers return value to the specified value and set the PC to an override function that simply returns, bypassing the originally probed function. This gives us a nice clean way to implement systematic error injection for all of our code paths. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: fix incorrect comment with regard to VLAN packet handling	Girish Moodalbail	2017-11-10	1	-8/+0
\| \| \| \| \| \| \| \| \| \|	The commit bcc6d4790361 ("net: vlan: make non-hw-accel rx path similar to hw-accel") unified accel and non-accel path for VLAN RX. With that fix we do not register any packet_type handler for VLANs anymore, so fix the incorrect comment. Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'act_vlan-rcu'	David S. Miller	2017-11-10	3	-38/+94
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Manish Kurup says: ==================== net_sched actions: act_vlan now uses RCU This commit consists of 3 patches: patch1 (1/3): The VLAN action maintains one set of stats across all cores, and uses a spinlock to synchronize updates to it from the same. Changed this to use a per-CPU stats context instead. This change will result in better performance. patch2 (2/3): Modified netronome nfp flower action to use VLAN helper functions instead of accessing/referencing TC act_vlan private structures directly. patch3 (3/3): Using a spinlock in the VLAN action causes performance issues when the VLAN action is used on multiple cores. Rewrote the VLAN action to use RCU read locking for reads and updates instead. All functions now use an RCU dereferenced pointer to access the VLAN action context. Modified helper functions used by other modules, to use the RCU as opposed to directly accessing the structure. As part of this review, there were some changes suggested by reviewers. I have incorporated all the changes that were requested. Here're the changes: v2: Fixed all helper functions to use RCU (rtnl_dereference) - Eric, Jamal v2: Fixed indentation, extra line nits - Jamal, Jiri v2: Moved rcu_head to the end of the struct - Jiri v2: Re-formatted locals to reverse-christmas-tree - Jiri v2: Removed mismatched spin_lock() - Cong v2: Removed spin_lock_bh() in tcf_vlan_init, rtnl_dereference() should suffice - Cong, Jiri v4: Modified the nfp flower action code to use the VLAN helper functions instead of referencing the structure directly. Isolated this into a separate patch - Pieter Jansen v5: Got rid of the unlikely() for the allocation case - Simon Horman v6: Had forgotten cleanup functions for RCU alloc, added them - Dave Miller v7: Re-formatted more locals to reverse-christmas-tree - Pieter V v8: Reverted reverse-christmas-tree(v7), not required when dependencies make it difficult to implement - Alexander D v9: Cover letter subject change - Jamal ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	act_vlan: VLAN action rewrite to use RCU lock/unlock and update	Manish Kurup	2017-11-10	2	-33/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using a spinlock in the VLAN action causes performance issues when the VLAN action is used on multiple cores. Rewrote the VLAN action to use RCU read locking for reads and updates instead. All functions now use an RCU dereferenced pointer to access the VLAN action context. Modified helper functions used by other modules, to use the RCU as opposed to directly accessing the structure. Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Manish Kurup <manish.kurup@verizon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	nfp flower action: Modified to use VLAN helper functions	Manish Kurup	2017-11-10	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Modified netronome nfp flower action to use VLAN helper functions instead of accessing/referencing TC act_vlan private structures directly. Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Manish Kurup <manish.kurup@verizon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	act_vlan: Change stats update to use per-core stats	Manish Kurup	2017-11-10	1	-4/+6
\|/ \| \| \| \| \| \| \| \| \| \| \|	The VLAN action maintains one set of stats across all cores, and uses a spinlock to synchronize updates to it from the same. Changed this to use a per-CPU stats context instead. This change will result in better performance. Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Manish Kurup <manish.kurup@verizon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sfc: don't warn on successful change of MAC	Robert Stonehouse	2017-11-10	1	-1/+1
\| \| \| \| \| \|	Fixes: 535a61777f44e ("sfc: suppress handled MCDI failures when changing the MAC address") Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: vxge: remove redundant assignments and pointers	Colin Ian King	2017-11-10	1	-8/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several pointers that are being assigned but never read so remove these as they are redundant. Also remove an assignment to function_mode that is never read. Cleans up several clang warnings: vxge-main.c:1139:2: warning: Value stored to 'hldev' is never read vxge-main.c:1294:2: warning: Value stored to 'hldev' is never read vxge-main.c:2188:2: warning: Value stored to 'dev' is never read vxge-main.c:2188:2: warning: Value stored to 'dev' is never read vxge-main.c:2723:2: warning: Value stored to 'function_mode' is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'ip_gre-flags-update'	David S. Miller	2017-11-10	1	-5/+53
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Xin Long says: ==================== ip_gre: add support for i/o_flags update ip_gre is using as many ip_tunnel apis as possible, newlink works fine as gre would do it's own part in .ndo_init. But when changing link, ip_tunnel_changelink doesn't even update i/o_flags, and also the update of these flags would cause some other gre's properties need to be updated or recalculated. These two patch are to add i/o_flags update and then do adjustment on some gre's properties according to the new i/o_flags. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ip_gre: add the support for i/o_flags update via ioctl	Xin Long	2017-11-10	1	-3/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As patch 'ip_gre: add the support for i/o_flags update via netlink' did for netlink, we also need to do the same job for these update via ioctl. This patch is to update i/o_flags and call ipgre_link_update to recalculate these gre properties after ip_tunnel_ioctl does the common update. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ip_gre: add the support for i/o_flags update via netlink	Xin Long	2017-11-10	1	-2/+37
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now ip_gre is using ip_tunnel_changelink to update it's properties, but ip_tunnel_changelink in ip_tunnel doesn't update i/o_flags as a common function. o_flags updates would cause that tunnel->tun_hlen / hlen and dev->mtu / needed_headroom need to be recalculated, and dev->(hw_)features need to be updated as well. Therefore, we can't just add the update into ip_tunnel_update called in ip_tunnel_changelink, and it's also better not to touch ip_tunnel codes. This patch updates i/o_flags and calls ipgre_link_update to recalculate these gre properties after ip_tunnel_changelink does the common update. Note that since ipgre_link_update doesn't know the lower dev, it will update gre->hlen, dev->mtu and dev->needed_headroom with the value of 'new tun_hlen - old tun_hlen'. In this way, we can avoid many redundant codes, unlike ip6_gre. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'tcp-ns-rmem-wmem'	David S. Miller	2017-11-10	11	-48/+76
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Eric Dumazet says: ==================== net: Namespace-ify sysctl_tcp_rmem and sysctl_tcp_wmem We need to get per netns sysctl for sysctl_[proto]_rmem and sysctl_[proto]_wmem This patch series adds the basic infrastructure allowing per proto conversion, and takes care of TCP. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tcp: Namespace-ify sysctl_tcp_rmem and sysctl_tcp_wmem	Eric Dumazet	2017-11-10	8	-43/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Note that when a new netns is created, it inherits its sysctl_tcp_rmem and sysctl_tcp_wmem from initial netns. This change is needed so that we can refine TCP rcvbuf autotuning, to take RTT into consideration. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: allow per netns sysctl_rmem and sysctl_wmem for protos	Eric Dumazet	2017-11-10	3	-5/+29
\|/ \| \| \| \| \| \| \| \| \| \| \|	As we want to gradually implement per netns sysctl_rmem and sysctl_wmem on per protocol basis, add two new fields in struct proto, and two new helpers : sk_get_wmem0() and sk_get_rmem0() First user will be TCP. Then UDP and SCTP can be easily converted, while DECNET probably wont get this support. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: dsa: Don't add vlans when vlan filtering is disabled	Andrew Lunn	2017-11-10	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The software bridge can be build with vlan filtering support included. However, by default it is turned off. In its turned off state, it still passes VLANs via switchev, even though they are not to be used. Don't pass these VLANs to the hardware. Only do so when vlan filtering is enabled. This fixes at least one corner case. There are still issues in other corners, such as when vlan_filtering is later enabled. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge tag 'mlx5-updates-2017-11-09' of ↵	David S. Miller	2017-11-10	9	-52/+214
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2017-11-09 This series introduces vlan offloads related improvements for mlx5 ethernet netdev driver, from Gal Pressman. - Add support for 802.1ad vlan filter - Add support for 802.1ad vlan insertion - Add vlan offloads statistics to ethtool (inserted/stripped vlans) - CHECKSUM_COMPLETE support for vlan traffic when vlan stripping is off! (Finally) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/mlx5e: CHECKSUM_COMPLETE offload for VLAN/QinQ packets	Gal Pressman	2017-11-09	1	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the VLAN tag is present in the packet buffer (i.e VLAN stripping disabled, QinQ) the driver will currently report CHECKSUM_UNNECESSARY. Instead of using CHECKSUM_COMPLETE offload for packets with first ethertype of IPv4/6, use it for packets with last ethertype of IPv4/6 to cover the former cases as well. The checksum field present in the CQE is calculated from the IP header until the end of the packet. When the first ethertype is different than IPv4/6 (for ex. 802.1Q VLAN) a checksum of the VLAN header/s should be added. The small header/s checksum calculation will allow us to use CHECKSUM_COMPLETE instead of CHECKSUM_UNNECESSARY. Testing bandwidth of one and 8 TCP streams to a single RQ, LRO and VLAN stripping offloads disabled: CPU: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz NIC: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] Before: +--------------+--------------------+---------------------+----------------------+ \| Traffic type \| 1 Stream BW [Mbps] \| 8 Streams BW [Mbps] \| Checksum offload \| +--------------+--------------------+---------------------+----------------------+ \| Untagged \| 28,247.35 \| 24,716.88 \| CHECKSUM_COMPLETE \| \| VLAN \| 27,516.69 \| 23,752.26 \| CHECKSUM_UNNECESSARY \| \| QinQ \| 6,961.30 \| 20,667.04 \| CHECKSUM_UNNECESSARY \| +--------------+--------------------+---------------------+----------------------+ Now: +--------------+--------------------+---------------------+-------------------+ \| Traffic type \| 1 Stream BW [Mbps] \| 8 Streams BW [Mbps] \| Checksum offload \| +--------------+--------------------+---------------------+-------------------+ \| Untagged \| 28,521.28 \| 24,926.32 \| CHECKSUM_COMPLETE \| \| VLAN \| 27,389.37 \| 23,715.34 \| CHECKSUM_COMPLETE \| \| QinQ \| 6,901.77 \| 20,845.73 \| CHECKSUM_COMPLETE \| +--------------+--------------------+---------------------+-------------------+ No performance degradation observed. Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
\| *	net/mlx5e: Add VLAN offloads statistics	Gal Pressman	2017-11-09	5	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following counters are now exposed through ethtool -S: rx[i]_removed_vlan_packets (per channel) rx_removed_vlan_packets tx[i]_added_vlan_packets (per channel) tx_added_vlan_packets rx_removed_vlan_packets: The number of packets that had their outer VLAN header stripped to the CQE by the hardware. tx_added_vlan_packets: The number of packets that had their outer VLAN header inserted by the hardware. Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
\| *	net/mlx5e: Add 802.1ad VLAN insertion support	Gal Pressman	2017-11-09	3	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Report VLAN insertion support for S-tagged packets and add support by choosing the correct VLAN type in the WQE. Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
\| *	net/mlx5e: Add 802.1ad VLAN filter steering rules	Gal Pressman	2017-11-09	3	-13/+112
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a user chooses to use 802.1ad VLAN the proper steering rules will be added to the VLAN flow table (matching the specific S-tag VID). Due to current hardware limitation, when using 802.1ad, we must disable C-tag VLAN stripping on the RQs. Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
\| *	net/mlx5e: Declare bitmap using kernel macro	Gal Pressman	2017-11-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace explicit declaration of bitmap with DECLARE_BITMAP kernel macro. Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
\| *	net: Introduce netdev_*_once functions	Gal Pressman	2017-11-09	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Extend the net device error logging with netdev__once macros. netdev__once are the equivalents of the dev_*_once macros which are useful for messages that should only be logged once. Also add netdev_WARN_ONCE, which is the "once" extension for the already existing netdev_WARN macro. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
\| *	net/mlx5e: Add rollback on add VLAN failure	Gal Pressman	2017-11-09	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When add VLAN rule fails the active vlan bit should be cleared. Fixes: afb736e9330a ("net/mlx5: Ethernet resource handling files") Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
\| *	net/mlx5e: Rename VLAN related variables and functions	Gal Pressman	2017-11-09	3	-37/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rename VLAN related symbols to better reflect the fact that they are associated to C-tag VLAN. Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>