Internet Draft                                                    J. Chu
draft-ietf-tcpm-initcwnd-07.txt                             N. Dukkipati
Intended status: Experimental                                   Y. Cheng
                                                               M. Mathis
Expiration date: July 2013                                  Google, Inc.
                                                        January 28, 2013


                    Increasing TCP's Initial Window

Status of this Memo

   Distribution of this memo is unlimited.

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This Internet-Draft will expire on May, 2013.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.



Chu, et. al.               Expires July 2013                    [Page 1]


Internet Draft      Increasing TCP's Initial Window         January 2013


Abstract

   This document proposes an experiment to increase the permitted TCP
   initial window (IW) from between 2 and 4 segments, as specified in
   RFC 3390, to 10 segments, with a fallback to the existing
   recommendation when performance issues are detected. It discusses the
   motivation behind the increase, the advantages and disadvantages of
   the higher initial window, and presents results from several large
   scale experiments showing that the higher initial window improves the
   overall performance of many web services without resulting in a
   congestion collapse. The document closes with a discussion of usage
   and deployment for further experimental purpose recommended by the
   IETF TCP Maintenance and Minor Extensions (TCPM) working group.

Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  TCP Modification . . . . . . . . . . . . . . . . . . . . . . .  4
   3.  Implementation Issues  . . . . . . . . . . . . . . . . . . . .  5
   4.  Background . . . . . . . . . . . . . . . . . . . . . . . . . .  6
   5.  Advantages of Larger Initial Windows . . . . . . . . . . . . .  7
     5.1 Reducing Latency . . . . . . . . . . . . . . . . . . . . . .  7
     5.2 Keeping up with the growth of web object size  . . . . . . .  8
     5.3 Recovering faster from loss on under-utilized or wireless
         links  . . . . . . . . . . . . . . . . . . . . . . . . . . .  8
   7.  Disadvantages of Larger Initial Windows for the Network  . . .  9
   8.  Mitigation of Negative Impact  . . . . . . . . . . . . . . . . 10
   9.  Interactions with the Retransmission Timer . . . . . . . . . . 10
   10. Experimental Results From Large Scale Cluster Tests  . . . . . 10
     10.1 The benefits  . . . . . . . . . . . . . . . . . . . . . . . 11
     10.2 The cost  . . . . . . . . . . . . . . . . . . . . . . . . . 11
   11. Other Studies  . . . . . . . . . . . . . . . . . . . . . . . . 12
   12. Usage and Deployment Recommendations . . . . . . . . . . . . . 13
   13. Related Proposals  . . . . . . . . . . . . . . . . . . . . . . 14
   14. Security Considerations  . . . . . . . . . . . . . . . . . . . 14
   15. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 15
   16. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 15
   17. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 15
   Normative References . . . . . . . . . . . . . . . . . . . . . . . 16
   Informative References . . . . . . . . . . . . . . . . . . . . . . 16
   Appendix A - List of Concerns and Corresponding Test Results . . . 20



Chu, et. al.               Expires July 2013                    [Page 2]


Internet Draft      Increasing TCP's Initial Window         January 2013


   Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23
   Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . 23


1.  Introduction

   This document proposes to raise the upper bound on TCP's initial
   window (IW) to 10 segments (maximum 14600B). It is patterned after
   and borrows heavily from RFC 3390 [RFC3390] and earlier work in this
   area. Due to lingering concerns about possible side effects to other
   flows sharing the same network bottleneck, some of the
   recommendations are conditional on additional monitoring and
   evaluation.

   The primary argument in favor of raising IW follows from the evolving
   scale of the Internet. Ten segments are likely to fit into queue
   space available at any broadband access link, even when there are a
   reasonable number of concurrent connections.

   Lower speed links can be treated with environment specific
   configurations, such that they can be protected from being
   overwhelmed by large initial window bursts without imposing a
   suboptimal initial window on the rest of the Internet.

   This document reviews the advantages and disadvantages of using a
   larger initial window, and includes summaries of several large scale
   experiments showing that an initial window of 10 segments provides
   benefits across the board for a variety of BW, RTT, and BDP classes.
   These results show significant benefits for increasing IW for users
   at much smaller data rates than had been previously anticipated.
   However, at initial windows larger than 10, the results are mixed. We
   believe that these mixed results are not intrinsic, but are the
   consequence of various implementation artifacts, including overly
   aggressive applications employing many simultaneous connections.

   We recommend that all TCP implementations have a settable TCP IW
   parameter as long as there is a reasonable effort to monitor for
   possible interactions with other Internet applications and services
   as described in Section 12.  Furthermore, Section 10 details why 10
   segments may be an appropriate value, and while that value may
   continue to rise in the future, this document does not include any
   supporting evidence for values of IW larger than 10.

   In addition, we introduce a minor revision to RFC 3390 and RFC 5681
   [RFC5681] to eliminate resetting the initial window when the SYN or
   SYN/ACK is lost.

   The document closes with a discussion of the consensus from the TCPM



Chu, et. al.               Expires July 2013                    [Page 3]


Internet Draft      Increasing TCP's Initial Window         January 2013


   working group on the near-term usage and deployment of IW10 in the
   Internet.

   A complementary set of slides for this proposal can be found at
   [CD10].

2.  TCP Modification

   This document proposes an increase in the permitted upper bound for
   TCP's initial window (IW) to 10 segments depending on the MSS. This
   increase is optional: a TCP MAY start with an initial window that is
   smaller than 10 segments.

   More precisely, the upper bound for the initial window will be

         min (10*MSS, max (2*MSS, 14600))                            (1)

   This upper bound for the initial window size represents a change from
   RFC 3390 [RFC3390], which specified that the congestion window be
   initialized between 2 and 4 segments depending on the MSS.

   This change applies to the initial window of the connection in the
   first round trip time (RTT) of data transmission during or following
   the TCP three-way handshake. Neither the SYN/ACK nor its
   acknowledgment (ACK) in the three-way handshake should increase the
   initial window size.

   Note that all the test results described in this document were based
   on the regular Ethernet MTU of 1500 bytes. Future study of the effect
   of a different MTU may be needed to fully validate (1) above.

   Furthermore, RFC 3390 and RFC 5681 [RFC5681] state that

         "If the SYN or SYN/ACK is lost, the initial window used by a
         sender after a correctly transmitted SYN MUST be one segment
         consisting of MSS bytes."

   The proposed change to reduce the default RTO to 1 second [RFC6298]
   increases the chance for spurious SYN or SYN/ACK retransmission, thus
   unnecessarily penalizing connections with RTT > 1 second if their
   initial window is reduced to 1 segment. For this reason, it is
   RECOMMENDED that implementations refrain from resetting the initial
   window to 1 segment, unless either there have been more than one SYN
   or SYN/ACK retransmissions, or true loss detection has been made.

   TCP implementations use slow start in as many as three different
   ways: (1) to start a new connection (the initial window); (2) to
   restart transmission after a long idle period (the restart window);



Chu, et. al.               Expires July 2013                    [Page 4]


Internet Draft      Increasing TCP's Initial Window         January 2013


   and (3) to restart transmission after a retransmit timeout (the loss
   window).  The change specified in this document affects the value of
   the initial window.  Optionally, a TCP MAY set the restart window to
   the minimum of the value used for the initial window and the current
   value of cwnd (in other words, using a larger value for the restart
   window should never increase the size of cwnd).  These changes do NOT
   change the loss window, which must remain 1 segment of MSS bytes (to
   permit the lowest possible window size in the case of severe
   congestion).

   Furthermore, to limit any negative effect that a larger initial
   window may have on links with limited bandwidth or buffer space,
   implementations SHOULD fall back to RFC 3390 for the restart window
   (RW) if any packet loss is detected during either the initial window,
   or a restart window, and more than 4KB of data is sent.
   Implementations must also follow RFC6298 [RFC6298] in order to avoid
   spurious RTO as described in section 9 later.

3.  Implementation Issues

   HTTP 1.1 specification allows only two simultaneous connections per
   domain, while web browsers open more simultaneous TCP connections
   [Ste08], partly to circumvent the small initial window in order to
   speed up the loading of web pages as described above.

   When web browsers open simultaneous TCP connections to the same
   destination, they are working against TCP's congestion control
   mechanisms [FF99]. Combining this behavior with larger initial
   windows further increases the burstiness and unfairness to other
   traffic in the network. If a larger initial window causes harm to any
   other flows then local application tuning will reveal that fewer
   concurrent connections yields better performance for some users. Any
   content provider deploying IW10 in conjunction with content
   distributed across multiple domains is explicitly encouraged to
   perform measurement experiments to detect such problems, and to
   consider reducing the number of concurrent connections used to
   retrieve their content.

   Some implementations advertise small initial receive window (Table 2
   in [Duk10]), effectively limiting how much window a remote host may
   use. In order to realize the full benefit of the large initial
   window, implementations are encouraged to advertise an initial
   receive window of at least 10 segments, except for the circumstances
   where a larger initial window is deemed harmful. (See the Mitigation
   section below.)

   TCP SACK option ([RFC2018]) was thought to be required in order for
   the larger initial window to perform well. But measurements from both



Chu, et. al.               Expires July 2013                    [Page 5]


Internet Draft      Increasing TCP's Initial Window         January 2013


   a testbed and live tests showed that IW=10 without the SACK option
   outperforms IW=3 with the SACK option [CW10].

4.  Background

   TCP congestion window was introduced as part of the congestion
   control algorithm by Van Jacobson in 1988 [Jac88]. The initial value
   of one segment was used as the starting point for newly established
   connections to probe the available bandwidth on the network.

   Today's Internet is dominated by web traffic running on top of short-
   lived TCP connections [IOR2009]. The relatively small initial window
   has become a limiting factor for the performance of many web
   applications.

   The global Internet has continued to grow, both in speed and
   penetration. According to the latest report from Akamai [AKAM10], the
   global broadband (> 2Mbps) adoption has surpassed 50%, propelling the
   average connection speed to reach 1.7Mbps, while the narrowband (<
   256Kbps) usage has dropped to 5%. In contrast, TCP's initial window
   has remained 4KB for a decade [RFC2414], corresponding to a bandwidth
   utilization of less than 200Kbps per connection, assuming an RTT of
   200ms.

   A large proportion of flows on the Internet are short web
   transactions over TCP, and complete before exiting TCP slow start.
   Speeding up the TCP flow startup phase, including circumventing the
   initial window limit, has been an area of active research [RFC6077,
   Sch08]. Numerous proposals exist [LAJW07, RFC4782, PRAKS02, PK98].
   Some require router support [RFC4782, PK98], hence are not practical
   for the public Internet. Others suggested bold, but often radical
   ideas, likely requiring more years of research before standardization
   and deployment.

   In the mean time, applications have responded to TCP's "slow" start.
   Web sites use multiple sub-domains [Bel10] to circumvent HTTP 1.1
   regulation on two connections per physical host [RFC2616]. As of
   today, major web browsers open multiple connections to the same site
   (up to six connections per domain [Ste08] and the number is growing).
   This trend is to remedy HTTP serialized download to achieve
   parallelism and higher performance. But it also implies today most
   access links are severely under-utilized, hence having multiple TCP
   connections improves performance most of the time. While raising the
   initial congestion window may cause congestion for certain users
   using these browsers, we argue that the browsers and other
   application need to respect HTTP 1.1 regulation and stop increasing
   number of simultaneous TCP connections. We believe a modest increase
   of the initial window will help to stop this trend, and provide the



Chu, et. al.               Expires July 2013                    [Page 6]


Internet Draft      Increasing TCP's Initial Window         January 2013


   best interim solution to improve overall user performance, and reduce
   the server, client, and network load.

   Note that persistent connections and pipelining are designed to
   address some of the above issues with HTTP [RFC2616]. Their presence
   does not diminish the need for a larger initial window. E.g., data
   from the Chrome browser show that 35% of HTTP requests are made on
   new TCP connections. Our test data also shows significant latency
   reduction with the large initial window even in conjunction with
   these two HTTP features ([Duk10]).

   Also note that packet pacing has been suggested as a possible
   mechanism to avoid large bursts and their associated harm [VH97].
   Pacing is not required in this proposal due to a strong preference
   for a simple solution. We suspect for packet bursts of a moderate
   size, packet pacing will not be necessary. This seems to be confirmed
   by our test results.

   More discussion of the increase in initial window, including the
   choice of 10 segments can be found in [Duk10, CD10].

5.  Advantages of Larger Initial Windows

5.1 Reducing Latency

   An increase of the initial window from 3 segments to 10 segments
   reduces the total transfer time for data sets greater than 4KB by up
   to 4 round trips.

   The table below compares the number of round trips between IW=3 and
   IW=10 for different transfer sizes, assuming infinite bandwidth, no
   packet loss, and the standard delayed acks with large delayed-ACK
   timer.

         ---------------------------------------
        | total segments |   IW=3   |   IW=10   |
         ---------------------------------------
        |         3      |     1    |      1    |
        |         6      |     2    |      1    |
        |        10      |     3    |      1    |
        |        12      |     3    |      2    |
        |        21      |     4    |      2    |
        |        25      |     5    |      2    |
        |        33      |     5    |      3    |
        |        46      |     6    |      3    |
        |        51      |     6    |      4    |
        |        78      |     7    |      4    |
        |        79      |     8    |      4    |



Chu, et. al.               Expires July 2013                    [Page 7]


Internet Draft      Increasing TCP's Initial Window         January 2013


        |       120      |     8    |      5    |
        |       127      |     9    |      5    |
         ---------------------------------------

   For example, with the larger initial window, a transfer of 32
   segments of data will require only two rather than five round trips
   to complete.

5.2 Keeping up with the growth of web object size

   RFC 3390 stated that the main motivation for increasing the initial
   window to 4KB was to speed up connections that only transmit a small
   amount of data, e.g., email and web. The majority of transfers back
   then were less than 4KB, and could be completed in a single RTT
   [All00].

   Since RFC 3390 was published, web objects have gotten significantly
   larger [Chu09, RJ10]. Today only a small percentage of web objects
   (e.g., 10% of Google's search responses) can fit in the 4KB initial
   window. The average HTTP response size of gmail.com, a highly
   scripted web-site, is 8KB (Figure 1. in [Duk10]). The average web
   page, including all static and dynamic scripted web objects on the
   page, has seen even greater growth in size [RJ10]. HTTP pipelining
   [RFC2616] and new web transport protocols such as SPDY [SPDY] allow
   multiple web objects to be sent in a single transaction, potentially
   benefiting from an even larger initial window in order to transfer an
   entire web page in a small number of round trips.

5.3 Recovering faster from loss on under-utilized or wireless links

   A greater-than-3-segment initial window increases the chance to
   recover packet loss through Fast Retransmit rather than the lengthy
   initial RTO [RFC5681]. This is because the fast retransmit algorithm
   requires three duplicate ACKs as an indication that a segment has
   been lost rather than reordered. While newer loss recovery techniques
   such as Limited Transmit [RFC3042] and Early Retransmit [RFC5827]
   have been proposed to help speeding up loss recovery from a smaller
   window, both algorithms can still benefit from the larger initial
   window because of a better chance to receive more ACKs to react upon.

6.  Disadvantages of Larger Initial Windows for the Individual Connection

   The larger bursts from an increase in the initial window may cause
   buffer overrun and packet drop in routers with small buffers, or
   routers experiencing congestion. This could result in unnecessary
   retransmit timeouts. For a large-window connection that is able to
   recover without a retransmit timeout, this could result in an
   unnecessarily-early transition from the slow-start to the congestion-



Chu, et. al.               Expires July 2013                    [Page 8]


Internet Draft      Increasing TCP's Initial Window         January 2013


   avoidance phase of the window increase algorithm.

   Premature segment drops are unlikely to occur in uncongested networks
   with sufficient buffering, or in moderately-congested networks where
   the congested router uses active queue management (such as Random
   Early Detection [FJ93, RFC2309, RFC3150]).

   Insufficient buffering is more likely to exist in the access routers
   connecting slower links. A recent study of access router buffer size
   [DGHS07] reveals the majority of access routers provision enough
   buffer for 130ms or longer, sufficient to cover a burst of more than
   10 packets at 1Mbps speed, but possibly not sufficient for browsers
   opening simultaneous connections.

   A testbed study [CW10] on the effect of the larger initial window
   with five simultaneously opened connections revealed that, even with
   limited buffer size on slow links, IW=10 still reduced the total
   latency of web transactions, although at the cost of higher packet
   drop rates as compared to IW=3.

   Some TCP connections will receive better performance with the larger
   initial window even if the burstiness of the initial window results
   in premature segment drops.  This will be true if (1) the TCP
   connection recovers from the segment drop without a retransmit
   timeout, and (2) the TCP connection is ultimately limited to a small
   congestion window by either network congestion or by the receiver's
   advertised window.

7.  Disadvantages of Larger Initial Windows for the Network

   An increase in the initial window may increase congestion in a
   network. However, since the increase is one-time only (at the
   beginning of a connection), and the rest of TCP's congestion backoff
   mechanism remains in place, it's unlikely the increase by itself will
   render a network in a persistent state of congestion, or even
   congestion collapse. This seems to have been confirmed by the large
   scale web experiments described later.

   It should be noted that the above may not hold if applications open a
   large number of simultaneous connections.

   Until this proposal is widely deployed, a fairness issue may exist
   between flows adopting a larger initial window vs flows that are
   RFC3390-compliant. Although no severe unfairness has been detected on
   all the known tests so far, further study on this topic may be
   warranted.

   Some of the discussions from RFC 3390 are still valid for IW=10.



Chu, et. al.               Expires July 2013                    [Page 9]


Internet Draft      Increasing TCP's Initial Window         January 2013


   Moreover, it is worth noting that although TCP NewReno increases the
   chance of duplicate segments when trying to recover multiple packet
   losses from a large window, the wide support of TCP Selective
   Acknowledgment (SACK) option [RFC2018] in all major OSes today should
   keep the volume of duplicate segments in check.

   Recent measurements [Get11] provide evidence of extremely large
   queues (in the order of one second or more) at access networks of the
   Internet. While a significant part of the buffer bloat is contributed
   by large downloads/uploads such as video files, emails with large
   attachments, backups and download of movies to disk, some of the
   problem is also caused by Web browsing of image heavy sites [Get11].
   This queuing delay is generally considered harmful for responsiveness
   of latency sensitive traffic such as DNS queries, ARP, DHCP, VoIP and
   Gaming. IW=10 can exacerbate this problem when doing short downloads
   such as Web browsing [Get11-1]. The mitigations proposed for the
   broader problem of buffer bloating are also applicable in this case,
   such as the use of ECN, AQM schemes [CoDel] and traffic
   classification (QoS).

8.  Mitigation of Negative Impact

   Much of the negative impact from an increase in the initial window is
   likely to be felt by users behind slow links with limited buffers.
   The negative impact can be mitigated by hosts directly connected to a
   low-speed link advertising a smaller initial receive window than 10
   segments. This can be achieved either through manual configuration by
   the users, or through the host stack auto-detecting the low bandwidth
   links.

   Additional suggestions to improve the end-to-end performance of slow
   links can be found in RFC 3150 [RFC3150].

9.  Interactions with the Retransmission Timer

   A large initial window increases the chance of spurious RTO on a low-
   bandwidth path because the packet transmission time will dominate the
   round-trip time. To minimize spurious retransmissions,
   implementations MUST follow