netem: Introduce skb_orphan_partial() helper

Commit 547669d483e578 ("tcp: xps: fix reordering issues") added unexpected reorders in case netem is used in a MQ setup for high performance test bed. ETH=eth0 tc qd del dev $ETH root 2>/dev/null tc qd add dev $ETH root handle 1: mq for i in `seq 1 32` do tc qd add dev $ETH parent 1:$i netem delay 100ms done As all tcp packets are orphaned by netem, TCP stack believes it can set skb->ooo_okay on all packets. In order to allow producers to send more packets, we want to keep sk_wmem_alloc from reaching sk_sndbuf limit. We can do that by accounting one byte per skb in netem queues, so that TCP stack is not fooled too much. Tested: With above MQ/netem setup, scaling number of concurrent flows gives linear results and no reorders/retransmits lpq83:~# for n in 1 10 20 30 40 50 60 70 80 90 100 do echo -n "n:$n " ; ./super_netperf $n -H 10.7.7.84; done n:1 198.46 n:10 2002.69 n:20 4000.98 n:30 6006.35 n:40 8020.93 n:50 10032.3 n:60 12081.9 n:70 13971.3 n:80 16009.7 n:90 17117.3 n:100 17425.5 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
author: Eric Dumazet <edumazet@google.com> 2013-07-31 02:55:08 +0200
committer: David S. Miller <davem@davemloft.net> 2013-07-31 23:59:49 +0200
commit: f2f872f9272a79a1048877ea14c15576f46c225e (patch)
tree: 22af2e425ac96068c2d383c052e1d287db867211 /net/sched
parent: net: split rt_genid for ipv4 and ipv6 (diff)
download: linux-f2f872f9272a79a1048877ea14c15576f46c225e.tar.xz
linux-f2f872f9272a79a1048877ea14c15576f46c225e.zip
1 files changed, 1 insertions, 4 deletions
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 82f6016d89ab..a6d788d45216 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -412,12 +412,9 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 	/* If a delay is expected, orphan the skb. (orphaning usually takes
 	 * place at TX completion time, so _before_ the link transit delay)
-	 * Ideally, this orphaning should be done after the rate limiting
-	 * module, because this breaks TCP Small Queue, and other mechanisms
-	 * based on socket sk_wmem_alloc.
 	 */
 	if (q->latency || q->jitter)
-		skb_orphan(skb);
+		skb_orphan_partial(skb);
 
 	/*
 	 * If we need to duplicate packet, then re-insert at top of the
author	Eric Dumazet <edumazet@google.com>	2013-07-31 02:55:08 +0200
committer	David S. Miller <davem@davemloft.net>	2013-07-31 23:59:49 +0200
commit	f2f872f9272a79a1048877ea14c15576f46c225e (patch)
tree	22af2e425ac96068c2d383c052e1d287db867211 /net/sched
parent	net: split rt_genid for ipv4 and ipv6 (diff)
download	linux-f2f872f9272a79a1048877ea14c15576f46c225e.tar.xz linux-f2f872f9272a79a1048877ea14c15576f46c225e.zip