Internet Engineering Task Force (IETF)                     A. Zimmermann
Request for Comments: 6069                                  A. Hannemann
Category: Experimental                            RWTH Aachen University
ISSN: 2070-1721                                            December 2010


   Making TCP More Robust to Long Connectivity Disruptions (TCP-LCD)

Abstract

   Disruptions in end-to-end path connectivity, which last longer than
   one retransmission timeout, cause suboptimal TCP performance.  The
   reason for this performance degradation is that TCP interprets
   segment loss induced by long connectivity disruptions as a sign of
   congestion, resulting in repeated retransmission timer backoffs.
   This, in turn, leads to a delayed detection of the re-establishment
   of the connection since TCP waits for the next retransmission timeout
   before it attempts a retransmission.

   This document proposes an algorithm to make TCP more robust to long
   connectivity disruptions (TCP-LCD).  It describes how standard ICMP
   messages can be exploited during timeout-based loss recovery to
   disambiguate true congestion loss from non-congestion loss caused by
   connectivity disruptions.  Moreover, a reversion strategy of the
   retransmission timer is specified that enables a more prompt
   detection of whether or not the connectivity to a previously
   disconnected peer node has been restored.  TCP-LCD is a TCP sender-
   only modification that effectively improves TCP performance in the
   case of connectivity disruptions.

Status of This Memo

   This document is not an Internet Standards Track specification; it is
   published for examination, experimental implementation, and
   evaluation.

   This document defines an Experimental Protocol for the Internet
   community.  This document is a product of the Internet Engineering
   Task Force (IETF).  It represents the consensus of the IETF
   community.  It has received public review and has been approved for
   publication by the Internet Engineering Steering Group (IESG).  Not
   all documents approved by the IESG are a candidate for any level of
   Internet Standard; see Section 2 of RFC 5741.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   http://www.rfc-editor.org/info/rfc6069.




Zimmermann & Hannemann        Experimental                      [Page 1]


RFC 6069             Making TCP More Robust to LCDs        December 2010


Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1. Introduction ....................................................3
   2. Terminology .....................................................4
   3. Connectivity Disruption Indication ..............................5
   4. Connectivity Disruption Reaction ................................7
      4.1. Basic Idea .................................................7
      4.2. Algorithm Details ..........................................8
   5. Discussion of TCP-LCD ..........................................11
      5.1. Retransmission Ambiguity ..................................12
      5.2. Wrapped Sequence Numbers ..................................12
      5.3. Packet Duplication ........................................13
      5.4. Probing Frequency .........................................14
      5.5. Reaction during Connection Establishment ..................14
      5.6. Reaction in Steady-State ..................................14
   6. Dissolving Ambiguity Issues Using the TCP Timestamps Option ....15
   7. Interoperability Issues ........................................17
      7.1. Detection of TCP Connection Failures ......................17
      7.2. Explicit Congestion Notification (ECN) ....................17
      7.3. TCP-LCD and IP Tunnels ....................................17
   8. Related Work ...................................................18
   9. Security Considerations ........................................19
   10. Acknowledgments ...............................................20
   11. References ....................................................20
      11.1. Normative References .....................................20
      11.2. Informative References ...................................21










Zimmermann & Hannemann        Experimental                      [Page 2]


RFC 6069             Making TCP More Robust to LCDs        December 2010


1.  Introduction

   Connectivity disruptions can occur in many different situations.  The
   frequency of connectivity disruptions depends on the properties of
   the end-to-end path between the communicating hosts.  While
   connectivity disruptions can occur in traditional wired networks,
   e.g., disruption caused by an unplugged network cable, the likelihood
   of their occurrence is significantly higher in wireless (multi-hop)
   networks.  Especially, end-host mobility, network topology changes,
   and wireless interferences are crucial factors.  In the case of the
   Transmission Control Protocol (TCP) [RFC0793], the performance of the
   connection can experience a significant reduction compared to a
   permanently connected path [SESB05].  This is because TCP, which was
   originally designed to operate in fixed and wired networks, generally
   assumes that the end-to-end path connectivity is relatively stable
   over the connection's lifetime.

   Depending on their duration, connectivity disruptions can be
   classified into two groups [TCP-RLCI]: "short" and "long".  A
   connectivity disruption is "short" if connectivity returns before the
   retransmission timer fires for the first time.  In this case, TCP
   recovers lost data segments through Fast Retransmit and lost
   acknowledgments (ACKs) through successfully delivered later ACKs.
   Connectivity disruptions are declared as "long" for a given TCP
   connection if the retransmission timer fires at least once before
   connectivity is resumed.  Whether or not path characteristics, like
   the round-trip time (RTT) or the available bandwidth, have changed
   when connectivity resumes after a disruption is another important
   aspect for TCP's retransmission scheme [TCP-RLCI].

   The algorithm specified in this document improves TCP's behavior in
   the case of "long connectivity disruptions".  In particular, it
   focuses on the period prior to the re-establishment of the
   connectivity to a previously disconnected peer node.  The document
   does not describe any modifications to TCP's behavior and its
   congestion control mechanisms [RFC5681] after connectivity has been
   restored.

   When a long connectivity disruption occurs on a TCP connection, the
   TCP sender eventually does not receive any more acknowledgments.
   After the retransmission timer expires, the TCP sender enters the
   timeout-based loss recovery and declares the oldest outstanding
   segment (SND.UNA) as lost.  Since TCP tightly couples reliability and
   congestion control, the retransmission of SND.UNA is triggered
   together with the reduction of the transmission rate.  This is based
   on the assumption that segment loss is an indication of congestion
   [RFC5681].  As long as the connectivity disruption persists, TCP will
   repeat this procedure until the oldest outstanding segment has



Zimmermann & Hannemann        Experimental                      [Page 3]


RFC 6069             Making TCP More Robust to LCDs        December 2010


   successfully been acknowledged or until the connection has timed out.
   TCP implementations that follow the recommended retransmission
   timeout (RTO) management of RFC 2988 [RFC2988] double the RTO after
   each retransmission attempt.  However, the RTO growth may be bounded
   by an upper limit, the maximum RTO, which is at least 60 s, but may
   be longer: Linux, for example, uses 120 s.  If connectivity is
   restored between two retransmission attempts, TCP still has to wait
   until the retransmission timer expires before resuming transmission,
   since it simply does not have any means to know if the connectivity
   has been re-established.  Therefore, depending on when connectivity
   becomes available again, this can waste up to a maximum RTO of
   possible transmission time.

   This retransmission behavior is not efficient, especially in
   scenarios with long connectivity disruptions.  In the ideal case, TCP
   would attempt a retransmission as soon as connectivity to its peer
   has been re-established.  In this document, we specify a TCP sender-
   only modification to provide robustness to long connectivity
   disruptions (TCP-LCD).  The memo describes how the standard Internet
   Control Message Protocol (ICMP) can be exploited during timeout-based
   loss recovery to identify non-congestion loss caused by long
   connectivity disruptions.  TCP-LCD's reversion strategy of the
   retransmission timer enables higher-frequency retransmissions and
   thereby a prompt detection when connectivity to a previously
   disconnected peer node has been restored.  If no congestion is
   present, TCP-LCD approaches the ideal behavior.

   Experimental results of a Linux implementation of TCP-LCD have been
   presented in [ZimHan09].  The implementation has been incorporated
   into mainline Linux, and is already used within the Internet.  Thus
   far, no negative experiences have been reported that could be
   attributed to the algorithm.  However, we consider TCP-LCD as
   experimental until more real-life results have been obtained.
   Nevertheless, we encourage implementation of TCP-LCD under other
   operating systems to provide for broader testing and experimentation
   opportunities.

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

   The reader should be familiar with the algorithm and terminology from
   [RFC2988], which defines the standard algorithm that Transmission
   Control Protocol (TCP) senders are required to use to compute and
   manage their retransmission timer.  In this document, the terms
   "retransmission timer" and "retransmission timeout" are used as



Zimmermann & Hannemann        Experimental                      [Page 4]


RFC 6069             Making TCP More Robust to LCDs        December 2010


   defined in [RFC2988].  The retransmission timer ensures data delivery
   in the absence of any feedback from the receiver.  The duration of
   this timer is referred to as retransmission timeout (RTO).

   As defined in [RFC0793], the term "acceptable acknowledgment (ACK)"
   refers to a TCP segment that acknowledges previously unacknowledged
   data.  The TCP sender state variable "SND.UNA" and the current
   segment variable "SEG.SEQ" are used as defined in [RFC0793].  SND.UNA
   holds the segment sequence number of the earliest segment that has
   not been acknowledged by the TCP receiver (the oldest outstanding
   segment).  SEG.SEQ is the segment sequence number of a given segment.

   For the purposes of this specification, we define the term "timeout-
   based loss recovery", which refers to the state that a TCP sender
   enters upon the first timeout of the oldest outstanding segment
   (SND.UNA) and leaves upon the arrival of the *first* acceptable ACK.
   It is important to note that other documents use a different
   interpretation of the term "timeout-based loss recovery".  For
   example, the NewReno modification to TCP's Fast Recovery algorithm
   [RFC3782] extends the period that a TCP sender remains in timeout-
   based loss recovery compared to the one defined in this document.
   This is because [RFC3782] attempts to avoid unnecessary multiple Fast
   Retransmits that can occur after an RTO.

3.  Connectivity Disruption Indication

   If the queue of an intermediate router that is experiencing a link
   outage can buffer all incoming packets, a connectivity disruption
   will only cause a variation in delay, which is handled well by TCP
   implementations using either Eifel [RFC3522], [RFC4015] or Forward
   RTO-Recovery (F-RTO) [RFC5682].  However, if the link outage lasts
   for too long, the router experiencing the link outage is forced to
   drop packets, and finally may remove the corresponding next hop from
   its routing table.  Means to detect such link outages include
   reacting to failed address resolution protocol (ARP) [RFC0826]
   queries, sensing unsuccessful links, and the like.  However, this is
   solely the responsibility of the respective router.

      Note: The focus of this memo is on introducing a method of how
      ICMP messages may be exploited to improve TCP's performance; how
      different physical and link-layer mechanisms below the network
      layer may trigger ICMP destination unreachable messages are out of
      scope of this memo.

   Provided that no other route to the specific destination exists, an
   Internet Protocol version 4 (IPv4) [RFC0791] router will notify the
   corresponding sending host about the dropped packets via ICMP
   destination unreachable messages of code 0 (net unreachable) or



Zimmermann & Hannemann        Experimental                      [Page 5]


RFC 6069             Making TCP More Robust to LCDs        December 2010


   code 1 (host unreachable) [RFC1812].  Therefore, the sending host can
   use the ICMP destination unreachable messages of these codes as an
   indication of a connectivity disruption, since the reception of these
   messages provides evidence that packets were dropped due to a link
   outage.

   For Internet Protocol version 6 (IPv6) [RFC2460], the counterpart of
   the ICMP destination unreachable message of code 0 (net unreachable)
   and of code 1 (host unreachable) is the ICMPv6 destination
   unreachable message of code 0 (no route to destination) [RFC4443].
   As with IPv4, a router should generate an ICMPv6 destination
   unreachable message of code 0 in response to a packet that cannot be
   delivered to its destination address because it lacks a matching
   entry in its routing table.

   Note that there are also other ICMP and ICMPv6 destination
   unreachable messages with different codes.  Some of them are
   candidates for connectivity disruption indications, too, but need
   further investigation (for example, ICMP destination unreachable
   messages with code 5 (source route failed), code 11 (net unreachable
   for TOS (Type of Service)), or code 12 (host unreachable for TOS)
   [RFC1812]).  On the other hand, codes that flag hard errors are of no
   use for this scheme, since TCP should abort the connection when those
   are received [RFC1122].

   For the sake of simplicity, we will use, unless explicitly qualified
   with ICMPv4 or ICMPv6, the term "ICMP unreachable message" as a
   synonym for ICMP destination unreachable messages of code 0 or code 1
   and ICMPv6 destination unreachable messages of code 0.  This implies
   that all keywords from [RFC2119] that deal with the handling of
   received ICMP messages apply in the same way to ICMPv6 messages.

   The accurate interpretation of ICMP unreachable messages as a
   connectivity disruption indication is complicated by the following
   two peculiarities of ICMP messages.  First, they do not necessarily
   operate on the same timescale as the packets, i.e., TCP segments that
   elicited them.  When a router drops a packet due to a missing route,
   it will not necessarily send an ICMP unreachable message immediately,
   but will rather queue it for later delivery.  Second, ICMP messages
   are subject to rate-limiting, e.g., when a router drops a whole
   window of data due to a link outage, it is unlikely to send as many
   ICMP unreachable messages as dropped TCP segments.  Depending on the
   load of the router, it may not even send any ICMP unreachable
   messages at all.  Both peculiarities originate from [RFC1812] for
   ICMPv4 and [RFC4443] for ICMPv6.






Zimmermann & Hannemann        Experimental                      [Page 6]


RFC 6069             Making TCP More Robust to LCDs        December 2010


   Fortunately, according to [RFC0792], ICMPv4 unreachable messages have
   to contain, in their body, the entire IPv4 header [RFC0791] of the
   datagram eliciting the ICMPv4 unreachable message, plus the first
   64 bits of the payload of that datagram.  This allows the sending
   host to match the ICMPv4 error message to the transport connection
   that elicited it.  RFC 1812 [RFC1812] augments these requirements and
   states that ICMPv4 messages should contain as much of the original
   datagram as possible without the length of the ICMPv4 datagram
   exceeding 576 bytes.  Therefore, in the case of TCP, at least the
   source port number, the destination port number, and the 32-bit TCP
   sequence number are included.  This allows the originating TCP to
   demultiplex the received ICMPv4 message and to identify the affected
   connection.  Moreover, it can identify which segment of the
   respective connection triggered the ICMPv4 unreachable message,
   unless there are several segments in flight with the same sequence
   number (see Section 5.1).

   For IPv6 [RFC2460], the payload of an ICMPv6 error message has to
   include as many bytes as possible from the IPv6 datagram that
   elicited the ICMPv6 error message, without making the error message
   exceed the minimum IPv6 MTU (1280 bytes) [RFC4443].  Thus, enough
   information is available to identify both the affected connection and
   the corresponding segment that triggered the ICMPv6 error message.

   A connectivity disruption indication in the form of an ICMP
   unreachable message associated with a presumably lost TCP segment
   provides strong evidence that the segment was not dropped due to
   congestion, but was successfully delivered as far as the reporting
   router.  It therefore did not witness any congestion at least on that
   part of the path that was traversed by both the TCP segment eliciting
   the ICMP unreachable message and the ICMP unreachable message itself.

4.  Connectivity Disruption Reaction

   Section 4.1 introduces the basic idea of TCP-LCD.  The complete
   algorithm is specified in Section 4.2.

4.1.  Basic Idea

   The goal of the algorithm is to promptly detect when connectivity to
   a previously disconnected peer node has been restored after a long
   connectivity disruption, while retaining appropriate behavior in case
   of congestion.  TCP-LCD exploits standard ICMP unreachable messages
   during timeout-based loss recovery.  This increases TCP's
   retransmission frequency by undoing one retransmission timer backoff
   whenever an ICMP unreachable message is received that contains a
   segment with a sequence number of a presumably lost retransmission.




Zimmermann & Hannemann        Experimental                      [Page 7]


RFC 6069             Making TCP More Robust to LCDs        December 2010


   This approach has the advantage of appropriately reducing the probing
   rate in case of congestion.  If either the retransmission itself or
   the corresponding ICMP message is dropped, the previously performed
   retransmission timer backoff is not undone, which effectively halves
   the probing rate.

4.2.  Algorithm Details

   A TCP sender that uses RFC 2988 [RFC2988] to compute TCP's
   retransmission timer MAY employ the following scheme to avoid over-
   conservative retransmission timer backoffs in case of long
   connectivity disruptions.  If a TCP sender does implement the
   following steps, the algorithm MUST be initiated upon the first
   timeout of the oldest outstanding segment (SND.UNA) and MUST be
   stopped upon the arrival of the first acceptable ACK.  The algorithm
   MUST NOT be re-initiated upon subsequent timeouts for the same
   segment.  The scheme SHOULD NOT be used in SYN-SENT or SYN-RECEIVED
   states [