linux - linux

	Commit message (Collapse)	Author	Files	Lines
2018-05-17	tcp: don't mark recently sent packets lost on RTO	Yuchung Cheng	1	-4/+8
	An RTO event indicates the head has not been acked for a long time after its last (re)transmission. But the other packets are not necessarily lost if they have been only sent recently (for example due to application limit). This patch would prohibit marking packets sent within an RTT to be lost on RTO event, using similar logic in TCP RACK detection. Normally the head (SND.UNA) would be marked lost since RTO should fire strictly after the head was sent. An exception is when the most recent RACK RTT measurement is larger than the (previous) RTO. To address this exception the head is always marked lost. Congestion control interaction: since we may not mark every packet lost, the congestion window may be more than 1 (inflight plus 1). But only one packet will be retransmitted after RTO, since tcp_retransmit_timer() calls tcp_retransmit_skb(...,segs=1). The connection still performs slow start from one packet (with Cubic congestion control). This commit was tested in an A/B test with Google web servers, and showed a reduction of 2% in (spurious) retransmits post timeout (SlowStartRetrans), and correspondingly reduced DSACKs (DSACKIgnoredOld) by 7%. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17	tcp: new helper tcp_rack_skb_timeout	Yuchung Cheng	3	-7/+14
	Create and export a new helper tcp_rack_skb_timeout and move tcp_is_rack to prepare the final RTO change. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17	tcp: separate loss marking and state update on RTO	Yuchung Cheng	1	-2/+2
	Previously when TCP times out, it first updates cwnd and ssthresh, marks packets lost, and then updates congestion state again. This was fine because everything not yet delivered is marked lost, so the inflight is always 0 and cwnd can be safely set to 1 to retransmit one packet on timeout. But the inflight may not always be 0 on timeout if TCP changes to mark packets lost based on packet sent time. Therefore we must first mark the packet lost, then set the cwnd based on the (updated) inflight. This is not a pure refactor. Congestion control may potentially break if it uses (not yet updated) inflight to compute ssthresh. Fortunately all existing congestion control modules does not do that. Also it changes the inflight when CA_LOSS_EVENT is called, and only westwood processes such an event but does not use inflight. This change has two other minor side benefits: 1) consistent with Fast Recovery s.t. the inflight is updated first before tcp_enter_recovery flips state to CA_Recovery. 2) avoid intertwining loss marking with state update, making the code more readable. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17	tcp: new helper tcp_timeout_mark_lost	Yuchung Cheng	1	-21/+29
	Refactor using a new helper, tcp_timeout_mark_loss(), that marks packets lost upon RTO. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17	tcp: account lost retransmit after timeout	Yuchung Cheng	3	-17/+6
	The previous approach for the lost and retransmit bits was to wipe the slate clean: zero all the lost and retransmit bits, correspondingly zero the lost_out and retrans_out counters, and then add back the lost bits (and correspondingly increment lost_out). The new approach is to treat this very much like marking packets lost in fast recovery. We don’t wipe the slate clean. We just say that for all packets that were not yet marked sacked or lost, we now mark them as lost in exactly the same way we do for fast recovery. This fixes the lost retransmit accounting at RTO time and greatly simplifies the RTO code by sharing much of the logic with Fast Recovery. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17	tcp: simpler NewReno implementation	Yuchung Cheng	3	-8/+39
	This is a rewrite of NewReno loss recovery implementation that is simpler and standalone for readability and better performance by using less states. Note that NewReno refers to RFC6582 as a modification to the fast recovery algorithm. It is used only if the connection does not support SACK in Linux. It should not to be confused with the Reno (AIMD) congestion control. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17	tcp: disable RFC6675 loss detection	Yuchung Cheng	2	-5/+10
	This patch disables RFC6675 loss detection and make sysctl net.ipv4.tcp_recovery = 1 controls a binary choice between RACK (1) or RFC6675 (0). Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17	tcp: support DUPACK threshold in RACK	Yuchung Cheng	3	-13/+29
	This patch adds support for the classic DUPACK threshold rule (#DupThresh) in RACK. When the number of packets SACKed is greater or equal to the threshold, RACK sets the reordering window to zero which would immediately mark all the unsacked packets below the highest SACKed sequence lost. Since this approach is known to not work well with reordering, RACK only uses it if no reordering has been observed. The DUPACK threshold rule is a particularly useful extension to the fast recoveries triggered by RACK reordering timer. For example data-center transfers where the RTT is much smaller than a timer tick, or high RTT path where the default RTT/4 may take too long. Note that this patch differs slightly from RFC6675. RFC6675 considers a packet lost when at least #DupThresh higher-sequence packets are SACKed. With RACK, for connections that have seen reordering, RACK continues to use a dynamically-adaptive time-based reordering window to detect losses. But for connections on which we have not yet seen reordering, this patch considers a packet lost when at least one higher sequence packet is SACKed and the total number of SACKed packets is at least DupThresh. For example, suppose a connection has not seen reordering, and sends 10 packets, and packets 3, 5, 7 are SACKed. RFC6675 considers packets 1 and 2 lost. RACK considers packets 1, 2, 4, 6 lost. There is some small risk of spurious retransmits here due to reordering. However, this is mostly limited to the first flight of a connection on which the sender receives SACKs from reordering. And RFC 6675 and FACK loss detection have a similar risk on the first flight with reordering (it's just that the risk of spurious retransmits from reordering was slightly narrower for those older algorithms due to the margin of 3*MSS). Also the minimum reordering window is reduced from 1 msec to 0 to recover quicker on short RTT transfers. Therefore RACK is more aggressive in marking packets lost during recovery to reduce the reordering window timeouts. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17	net: ethernet: ti: cpsw: disable mq feature for "AM33xx ES1.0" devices	Ivan Khoronzhuk	1	-49/+60
	The early versions of am33xx devices, related to ES1.0 SoC revision have errata limiting mq support. That's the same errata as commit 7da1160002f1 ("drivers: net: cpsw: add am335x errata workarround for interrutps") AM33xx Errata [1] Advisory 1.0.9 http://www.ti.com/lit/er/sprz360f/sprz360f.pdf After additional investigation were found that drivers w/a is propagated on all AM33xx SoCs and on DM814x. But the errata exists only for ES1.0 of AM33xx family, limiting mq support for revisions after ES1.0. So, disable mq support only for related SoCs and use separate polls for revisions allowing mq. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17	pfifo_fast: drop unneeded additional lock on dequeue	Paolo Abeni	2	-2/+7
	After the previous patch, for NOLOCK qdiscs, q->seqlock is always held when the dequeue() is invoked, we can drop any additional locking to protect such operation. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17	sched: replace __QDISC_STATE_RUNNING bit with a spin lock	Paolo Abeni	2	-5/+16
	So that we can use lockdep on it. The newly introduced sequence lock has the same scope of busylock, so it shares the same lockdep annotation, but it's only used for NOLOCK qdiscs. With this changeset we acquire such lock in the control path around flushing operation (qdisc reset), to allow more NOLOCK qdisc perf improvement in the next patch. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17	bpf: sockmap, on update propagate errors back to userspace	John Fastabend	1	-1/+1
	When an error happens in the update sockmap element logic also pass the err up to the user. Fixes: e5cd3abcb31a ("bpf: sockmap, refactor sockmap routines to work with hashmap") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-17	bpf: fix sock hashmap kmalloc warning	Yonghong Song	1	-0/+6
	syzbot reported a kernel warning below: WARNING: CPU: 0 PID: 4499 at mm/slab_common.c:996 kmalloc_slab+0x56/0x70 mm/slab_common.c:996 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 4499 Comm: syz-executor050 Not tainted 4.17.0-rc3+ #9 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 panic+0x22f/0x4de kernel/panic.c:184 __warn.cold.8+0x163/0x1b3 kernel/panic.c:536 report_bug+0x252/0x2d0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:178 [inline] do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992 RIP: 0010:kmalloc_slab+0x56/0x70 mm/slab_common.c:996 RSP: 0018:ffff8801d907fc58 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8801aeecb280 RCX: ffffffff8185ebd7 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000ffffffe1 RBP: ffff8801d907fc58 R08: ffff8801adb5e1c0 R09: ffffed0035a84700 R10: ffffed0035a84700 R11: ffff8801ad423803 R12: ffff8801aeecb280 R13: 00000000fffffff4 R14: ffff8801ad891a00 R15: 00000000014200c0 __do_kmalloc mm/slab.c:3713 [inline] __kmalloc+0x25/0x760 mm/slab.c:3727 kmalloc include/linux/slab.h:517 [inline] map_get_next_key+0x24a/0x640 kernel/bpf/syscall.c:858 __do_sys_bpf kernel/bpf/syscall.c:2131 [inline] __se_sys_bpf kernel/bpf/syscall.c:2096 [inline] __x64_sys_bpf+0x354/0x4f0 kernel/bpf/syscall.c:2096 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe The test case is against sock hashmap with a key size 0xffffffe1. Such a large key size will cause the below code in function sock_hash_alloc() overflowing and produces a smaller elem_size, hence map creation will be successful. htab->elem_size = sizeof(struct htab_elem) + round_up(htab->map.key_size, 8); Later, when map_get_next_key is called and kernel tries to allocate the key unsuccessfully, it will issue the above warning. Similar to hashtab, ensure the key size is at most MAX_BPF_STACK for a successful map creation. Fixes: 81110384441a ("bpf: sockmap, add hash map support") Reported-by: syzbot+e4566d29080e7f3460ff@syzkaller.appspotmail.com Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-17	libbpf: add ifindex to enable offload support	David Beckett	4	-3/+20
	BPF programs currently can only be offloaded using iproute2. This patch will allow programs to be offloaded using libbpf calls. Signed-off-by: David Beckett <david.beckett@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-17	bpf: add __printf verification to bpf_verifier_vlog	Mathieu Malaterre	1	-2/+2
	__printf is useful to verify format and arguments. ‘bpf_verifier_vlog’ function is used twice in verifier.c in both cases the caller function already uses the __printf gcc attribute. Remove the following warning, triggered with W=1: kernel/bpf/verifier.c:176:2: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-16	samples/bpf: Decrement ttl in fib forwarding example	David Ahern	1	-8/+31
	Only consider forwarding packets if ttl in received packet is > 1 and decrement ttl before handing off to bpf_redirect_map. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-16	bpf: bpftool, support for sockhash	John Fastabend	1	-0/+1
	This adds the SOCKHASH map type to bpftools so that we get correct pretty printing. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-16	bpf: selftest additions for SOCKHASH	John Fastabend	7	-349/+453
	This runs existing SOCKMAP tests with SOCKHASH map type. To do this we push programs into include file and build two BPF programs. One for SOCKHASH and one for SOCKMAP. We then run the entire test suite with each type. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-16	cxgb4: update LE-TCAM collection for T6	Rahul Lakkireddy	4	-8/+37
	For T6, clip table is separated from main TCAM. So, update LE-TCAM collection logic to collect clip table TCAM as well. IPv6 takes 4 entries in clip table TCAM compared to 2 entries in main TCAM. Also, in case of errors, keep LE-TCAM collected so far and set the status to partial dump. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	qed: Fix LL2 race during connection terminate	Michal Kalderon	1	-10/+14
	Stress on qedi/qedr load unload lead to list_del corruption. This is due to ll2 connection terminate freeing resources without verifying that no more ll2 processing will occur. This patch unregisters the ll2 status block before terminating the connection to assure this race does not occur. Fixes: 1d6cff4fca4366 ("qed: Add iSCSI out of order packet handling") Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	qed: Fix possibility of list corruption during rmmod flows	Michal Kalderon	1	-1/+10
	The ll2 flows of flushing the txq/rxq need to be synchronized with the regular fp processing. Caused list corruption during load/unload stress tests. Fixes: 0a7fb11c23c0f ("qed: Add Light L2 support") Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	qed: LL2 flush isles when connection is closed	Michal Kalderon	1	-0/+26
	Driver should free all pending isles once it gets a FLUSH cqe from FW. Part of iSCSI out of order flow. Fixes: 1d6cff4fca4366 ("qed: Add iSCSI out of order packet handling") Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net: ethoc: Remove useless test before clk_disable_unprepare	YueHaibing	1	-4/+2
	clk_disable_unprepare() already checks that the clock pointer is valid. No need to test it before calling it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net: stmmac: Remove useless test before clk_disable_unprepare	YueHaibing	1	-17/+7
	clk_disable_unprepare() already checks that the clock pointer is valid. No need to test it before calling it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net: qcom/emac: Encapsulate sgmii ops under one structure	Hemanth Puranik	4	-71/+103
	This patch introduces ops structure for sgmii, This by ensures that we do not need dummy functions in case of emulation platforms. Signed-off-by: Hemanth Puranik <hpuranik@codeaurora.org> Acked-by: Timur Tabi <timur@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net: qualcomm: rmnet: Remove redundant command check	Subash Abhinov Kasiviswanathan	1	-11/+3
	The command packet size is already checked once in rmnet_map_deaggregate() for the header, packet and trailer size, so this additional check is not needed. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net: qualcomm: rmnet: Add support for ethtool private stats	Subash Abhinov Kasiviswanathan	3	-16/+112
	Add ethtool private stats handler to debug the handling of packets with checksum offload header / trailer. This allows to keep track of the number of packets for which hardware computes the checksum and counts and reasons where checksum computation was skipped in hardware and was done in the network stack. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net: qualcomm: rmnet: Capture all drops in transmit path	Subash Abhinov Kasiviswanathan	1	-11/+10
	Packets in transmit path could potentially be dropped if there were errors while adding the MAP header or the checksum header. Increment the tx_drops stats in these cases. Additionally, refactor the code to free the packet and increment the tx_drops stat under a single label. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	drivers: net: Remove device_node checks with of_mdiobus_register()	Florian Fainelli	11	-61/+20
	A number of drivers have the following pattern: if (np) of_mdiobus_register() else mdiobus_register() which the implementation of of_mdiobus_register() now takes care of. Remove that pattern in drivers that strictly adhere to it. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Fugang Duan <fugang.duan@nxp.com> Reviewed-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	of: mdio: Fall back to mdiobus_register() with NULL device_node	Florian Fainelli	1	-0/+3
	When the device_node specified is NULL, fall back to mdiobus_register(). We have a number of drivers having a similar pattern which is: if (np) of_mdiobus_register() else mdiobus_register() so incorporate that behavior within the core of_mdiobus_register() function. This is also consistent with the stub version that we defined when CONFIG_OF=n. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net: ethernet: ti: cpsw-phy-sel: check bus_find_device() ret value	Grygorii Strashko	1	-1/+7
	This fixes klockworks warnings: Pointer 'dev' returned from call to function 'bus_find_device' at line 179 may be NULL and will be dereferenced at line 181. cpsw-phy-sel.c:179: 'dev' is assigned the return value from function 'bus_find_device'. bus.c:342: 'bus_find_device' explicitly returns a NULL value. cpsw-phy-sel.c:181: 'dev' is dereferenced by passing argument 1 to function 'dev_get_drvdata'. device.h:1024: 'dev' is passed to function 'dev_get_drvdata'. device.h:1026: 'dev' is explicitly dereferenced. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> [nsekhar@ti.com: add an error message, fix return path] Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	Revert "bonding: allow carrier and link status to determine link state"	Debabrata Banerjee	3	-14/+9
	This reverts commit 1386c36b30388f46a95100924bfcae75160db715. We don't want to encourage drivers to not report carrier status correctly, therefore remove this commit. Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	tc-testing: updated mirred and vlan with more tests	Roman Mashak	2	-20/+324
	Added extra test cases for different control actions (reclassify, pipe etc.), cookies, max values & exceeding maximum, and replace existing actions unit tests. Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	tc-testing: fixed copy-pasting error in police tests	Roman Mashak	1	-2/+2
	Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	sched: manipulate __QDISC_STATE_RUNNING in qdisc_run_* helpers	Paolo Abeni	3	-24/+19
	Currently NOLOCK qdiscs pay a measurable overhead to atomically manipulate the __QDISC_STATE_RUNNING. Such bit is flipped twice per packet in the uncontended scenario with packet rate below the line rate: on packed dequeue and on the next, failing dequeue attempt. This changeset moves the bit manipulation into the qdisc_run_{begin,end} helpers, so that the bit is now flipped only once per packet, with measurable performance improvement in the uncontended scenario. This also allows simplifying the qdisc teardown code path - since qdisc_is_running() is now effective for each qdisc type - and avoid a possible race between qdisc_run() and dev_deactivate_many(), as now the some_qdisc_is_busy() can properly detect NOLOCK qdiscs being busy dequeuing packets. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	bonding: allow carrier and link status to determine link state	Debabrata Banerjee	3	-9/+14
	In a mixed environment it may be difficult to tell if your hardware support carrier, if it does not it can always report true. With a new use_carrier option of 2, we can check both carrier and link status sequentially, instead of one or the other Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	bonding: allow use of tx hashing in balance-alb	Debabrata Banerjee	4	-15/+43
	The rx load balancing provided by balance-alb is not mutually exclusive with using hashing for tx selection, and should provide a decent speed increase because this eliminates spinlocks and cache contention. Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	bonding: use common mac addr checks	Debabrata Banerjee	1	-19/+9
	Replace homegrown mac addr checks with faster defs from etherdevice.h Note that this will also prevent any rlb arp updates for multicast addresses, however this should have been forbidden anyway. Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	bonding: don't queue up extraneous rlb updates	Debabrata Banerjee	1	-6/+12
	arps for incomplete entries can't be sent anyway. Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net/smc: check for pending termination	Karsten Graul	3	-3/+7
	Avoid to run the processing in smc_lgr_terminate() more than once, remember when the link group termination is triggered. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net/smc: drop messages when link state is inactive	Karsten Graul	1	-0/+2
	Drop incoming messages when the link is flagged as inactive. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net/smc: set link inactive before calling smc_lgr_free()	Karsten Graul	2	-1/+5
	Before smc_lgr_free() is called the link must be set inactive by calling smc_llc_link_inactive(). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net/smc: handle all error codes from smc_conn_create()	Karsten Graul	1	-0/+2
	Always set a reason_code when smc_conn_create() returns an error code. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net/smc: use a workqueue to defer llc send	Karsten Graul	4	-43/+104
	SMC handles deferred work in tasklets. As tasklets cannot sleep this can result in rare EBUSY conditions, so defer this work in a work queue. The high level api functions do not defer work because they can sleep until the llc send is actually completed. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net/smc: move link llc initialization to llc layer	Karsten Graul	3	-6/+12
	Move the llc layer specific initialization and cleanup out of smc_core.c into smc_llc.c (smc_llc_link_init and smc_llc_link_clear). Move all initialization of a link into the new init function. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net/smc: simplify test_link function usage	Karsten Graul	2	-9/+5
	Make smc_llc_send_test_link() static and remove it from the header file. And to send a test_link response set the response flag and send the message back as-is, without using smc_llc_send_test_link(). And because smc_llc_send_test_link() must no longer send responses, remove the response flag handling from the function. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net/smc: remove unnecessary cast	Karsten Graul	1	-3/+3
	Remove an unneeded (void *) cast from the calls to smc_llc_send_message(). No functional changes. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net/smc: register new rmbs with the peer	Karsten Graul	5	-8/+64
	Register new rmb buffers with the remote peer by exchanging a confirm_rkey llc message. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net/smc: no tx work trigger for fallback sockets	Ursula Braun	1	-2/+2
	If TCP_NODELAY is set or TCP_CORK is reset, setsockopt triggers the tx worker. This does not make sense, if the SMC socket switched to the TCP fallback when the connection is created. This patch adds the additional check for the fallback case. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16	net: hns3: Fixes the missing PCI iounmap for various legs	Fuyun Liang	1	-0/+2
	We call pcim_iomap in hclge_pci_init, pcim_iounmap should be called in error handle of hclge_init_ae_dev. We call pcim_iomap in hclge_pci_init, but do not call pcim_iounmap in hclge_pci_uninit. When we remove the hclge.ko and insert it again, a problem that pci can not map will happen. pcim_iounmap need to be called in hclge_pci_uninit. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>