tcp: fix retrans_stamp advancing in error cases

It can happen, that tcp_retransmit_skb fails due to some error. In such cases we might end up into a state where tp->retrans_out is zero but that's only because we removed the TCPCB_SACKED_RETRANS bit from a segment but couldn't retransmit it because of the error that happened. Therefore some assumptions that retrans_out checks are based do not necessarily hold, as there still can be an old retransmission but that is only visible in TCPCB_EVER_RETRANS bit. As retransmission happen in sequential order (except for some very rare corner cases), it's enough to check the head skb for that bit. Main reason for all this complexity is the fact that connection dying time now depends on the validity of the retrans_stamp, in particular, that successive retransmissions of a segment must not advance retrans_stamp under any conditions. It seems after quick thinking that this has relatively low impact as eventually TCP will go into CA_Loss and either use the existing check for !retrans_stamp case or send a retransmission successfully, setting a new base time for the dying timer (can happen only once). At worst, the dying time will be approximately the double of the intented time. In addition, tcp_packet_delayed() will return wrong result (has some cc aspects but due to rarity of these errors, it's hardly an issue). One of retrans_stamp clearing happens indirectly through first going into CA_Open state and then a later ACK lets the clearing to happen. Thus tcp_try_keep_open has to be modified too. Thanks to Damian Lukowski <damian@tvk.rwth-aachen.de> for hinting that this possibility exists (though the particular case discussed didn't after all have it happening but was just a debug patch artifact). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
author: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> 2009-12-09 05:54:11 +0100
committer: David S. Miller <davem@davemloft.net> 2009-12-09 05:56:12 +0100
commit: 77722b177a1606669c0b95dde03347e37d13b8fe (patch)
tree: 73c646300d1e7909ec163a371b6d7d7bf9d2f7f5
parent: tcp: Stalling connections: Move timeout calculation routine (diff)
download: linux-77722b177a1606669c0b95dde03347e37d13b8fe.tar.xz
linux-77722b177a1606669c0b95dde03347e37d13b8fe.zip
1 files changed, 32 insertions, 3 deletions
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 57ae96a04220..12cab7d74dba 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2717,6 +2717,35 @@ static void tcp_try_undo_dsack(struct sock *sk)
 	}
 }
 
+/* We can clear retrans_stamp when there are no retransmissions in the
+ * window. It would seem that it is trivially available for us in
+ * tp->retrans_out, however, that kind of assumptions doesn't consider
+ * what will happen if errors occur when sending retransmission for the
+ * second time. ...It could the that such segment has only
+ * TCPCB_EVER_RETRANS set at the present time. It seems that checking
+ * the head skb is enough except for some reneging corner cases that
+ * are not worth the effort.
+ *
+ * Main reason for all this complexity is the fact that connection dying
+ * time now depends on the validity of the retrans_stamp, in particular,
+ * that successive retransmissions of a segment must not advance
+ * retrans_stamp under any conditions.
+ */
+static int tcp_any_retrans_done(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb;
+
+	if (tp->retrans_out)
+		return 1;
+
+	skb = tcp_write_queue_head(sk);
+	if (unlikely(skb && TCP_SKB_CB(skb)->sacked & TCPCB_EVER_RETRANS))
+		return 1;
+
+	return 0;
+}
+
 /* Undo during fast recovery after partial ACK. */
 
 static int tcp_try_undo_partial(struct sock *sk, int acked)
@@ -2729,7 +2758,7 @@ static int tcp_try_undo_partial(struct sock *sk, int acked)
 		/* Plain luck! Hole if filled with delayed
 		 * packet, rather than with a retransmit.
 		 */
-		if (tp->retrans_out == 0)
+		if (!tcp_any_retrans_done(sk))
 			tp->retrans_stamp = 0;
 
 		tcp_update_reordering(sk, tcp_fackets_out(tp) + acked, 1);
@@ -2788,7 +2817,7 @@ static void tcp_try_keep_open(struct sock *sk)
 	struct tcp_sock *tp = tcp_sk(sk);
 	int state = TCP_CA_Open;
 
-	if (tcp_left_out(tp) || tp->retrans_out || tp->undo_marker)
+	if (tcp_left_out(tp) || tcp_any_retrans_done(sk) || tp->undo_marker)
 		state = TCP_CA_Disorder;
 
 	if (inet_csk(sk)->icsk_ca_state != state) {
@@ -2803,7 +2832,7 @@ static void tcp_try_to_open(struct sock *sk, int flag)
 
 	tcp_verify_left_out(tp);
 
-	if (!tp->frto_counter && tp->retrans_out == 0)
+	if (!tp->frto_counter && !tcp_any_retrans_done(sk))
 		tp->retrans_stamp = 0;
 
 	if (flag & FLAG_ECE)
author	Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>	2009-12-09 05:54:11 +0100
committer	David S. Miller <davem@davemloft.net>	2009-12-09 05:56:12 +0100
commit	77722b177a1606669c0b95dde03347e37d13b8fe (patch)
tree	73c646300d1e7909ec163a371b6d7d7bf9d2f7f5
parent	tcp: Stalling connections: Move timeout calculation routine (diff)
download	linux-77722b177a1606669c0b95dde03347e37d13b8fe.tar.xz linux-77722b177a1606669c0b95dde03347e37d13b8fe.zip