Internet Engineering Task Force                                  A. Ford
Internet-Draft                                                     Cisco
Intended status: Experimental                                  C. Raiciu
Expires: April 22, 2013                        University Politehnica of
                                                               Bucharest
                                                              M. Handley
                                               University College London
                                                          O. Bonaventure
                                                Universite catholique de
                                                                 Louvain
                                                        October 19, 2012


     TCP Extensions for Multipath Operation with Multiple Addresses
                   draft-ietf-mptcp-multiaddressed-11

Abstract

   TCP/IP communication is currently restricted to a single path per
   connection, yet multiple paths often exist between peers.  The
   simultaneous use of these multiple paths for a TCP/IP session would
   improve resource usage within the network, and thus improve user
   experience through higher throughput and improved resilience to
   network failure.

   Multipath TCP provides the ability to simultaneously use multiple
   paths between peers.  This document presents a set of extensions to
   traditional TCP to support multipath operation.  The protocol offers
   the same type of service to applications as TCP (i.e. reliable
   bytestream), and provides the components necessary to establish and
   use multiple TCP flows across potentially disjoint paths.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April 22, 2013.



Ford, et al.             Expires April 22, 2013                 [Page 1]


Internet-Draft                Multipath TCP                 October 2012


Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
     1.1.  Design Assumptions . . . . . . . . . . . . . . . . . . . .  4
     1.2.  Multipath TCP in the Networking Stack  . . . . . . . . . .  5
     1.3.  Terminology  . . . . . . . . . . . . . . . . . . . . . . .  6
     1.4.  MPTCP Concept  . . . . . . . . . . . . . . . . . . . . . .  7
     1.5.  Requirements Language  . . . . . . . . . . . . . . . . . .  8
   2.  Operation Overview . . . . . . . . . . . . . . . . . . . . . .  8
     2.1.  Initiating an MPTCP connection . . . . . . . . . . . . . .  9
     2.2.  Associating a new subflow with an existing MPTCP
           connection . . . . . . . . . . . . . . . . . . . . . . . .  9
     2.3.  Informing the other Host about another potential
           address  . . . . . . . . . . . . . . . . . . . . . . . . . 10
     2.4.  Data transfer using MPTCP  . . . . . . . . . . . . . . . . 11
     2.5.  Requesting a change in a path's priority . . . . . . . . . 11
     2.6.  Closing an MPTCP connection  . . . . . . . . . . . . . . . 12
     2.7.  Notable features . . . . . . . . . . . . . . . . . . . . . 12
   3.  MPTCP Protocol . . . . . . . . . . . . . . . . . . . . . . . . 12
     3.1.  Connection Initiation  . . . . . . . . . . . . . . . . . . 13
     3.2.  Starting a New Subflow . . . . . . . . . . . . . . . . . . 18
     3.3.  General MPTCP Operation  . . . . . . . . . . . . . . . . . 23
       3.3.1.  Data Sequence Mapping  . . . . . . . . . . . . . . . . 25
       3.3.2.  Data Acknowledgments . . . . . . . . . . . . . . . . . 28
       3.3.3.  Closing a Connection . . . . . . . . . . . . . . . . . 29
       3.3.4.  Receiver Considerations  . . . . . . . . . . . . . . . 30
       3.3.5.  Sender Considerations  . . . . . . . . . . . . . . . . 31
       3.3.6.  Reliability and Retransmissions  . . . . . . . . . . . 32
       3.3.7.  Congestion Control Considerations  . . . . . . . . . . 33
       3.3.8.  Subflow Policy . . . . . . . . . . . . . . . . . . . . 34
     3.4.  Address Knowledge Exchange (Path Management) . . . . . . . 35
       3.4.1.  Address Advertisement  . . . . . . . . . . . . . . . . 36



Ford, et al.             Expires April 22, 2013                 [Page 2]


Internet-Draft                Multipath TCP                 October 2012


       3.4.2.  Remove Address . . . . . . . . . . . . . . . . . . . . 39
     3.5.  Fast Close . . . . . . . . . . . . . . . . . . . . . . . . 40
     3.6.  Fallback . . . . . . . . . . . . . . . . . . . . . . . . . 41
     3.7.  Error Handling . . . . . . . . . . . . . . . . . . . . . . 44
     3.8.  Heuristics . . . . . . . . . . . . . . . . . . . . . . . . 45
       3.8.1.  Port Usage . . . . . . . . . . . . . . . . . . . . . . 45
       3.8.2.  Delayed Subflow Start  . . . . . . . . . . . . . . . . 45
       3.8.3.  Failure Handling . . . . . . . . . . . . . . . . . . . 46
   4.  Semantic Issues  . . . . . . . . . . . . . . . . . . . . . . . 47
   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 48
   6.  Interactions with Middleboxes  . . . . . . . . . . . . . . . . 51
   7.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 54
   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 54
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 56
     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 56
     9.2.  Informative References . . . . . . . . . . . . . . . . . . 56
   Appendix A.  Notes on use of TCP Options . . . . . . . . . . . . . 58
   Appendix B.  Control Blocks  . . . . . . . . . . . . . . . . . . . 60
     B.1.  MPTCP Control Block  . . . . . . . . . . . . . . . . . . . 60
       B.1.1.  Authentication and Metadata  . . . . . . . . . . . . . 60
       B.1.2.  Sending Side . . . . . . . . . . . . . . . . . . . . . 60
       B.1.3.  Receiving Side . . . . . . . . . . . . . . . . . . . . 61
     B.2.  TCP Control Blocks . . . . . . . . . . . . . . . . . . . . 61
       B.2.1.  Sending Side . . . . . . . . . . . . . . . . . . . . . 61
       B.2.2.  Receiving Side . . . . . . . . . . . . . . . . . . . . 61
   Appendix C.  Finite State Machine  . . . . . . . . . . . . . . . . 62
   Appendix D.  Changelog . . . . . . . . . . . . . . . . . . . . . . 62
     D.1.  Changes since draft-ietf-mptcp-multiaddressed-05 . . . . . 63
     D.2.  Changes since draft-ietf-mptcp-multiaddressed-04 . . . . . 63
     D.3.  Changes since draft-ietf-mptcp-multiaddressed-03 . . . . . 63
     D.4.  Changes since draft-ietf-mptcp-multiaddressed-02 . . . . . 63
     D.5.  Changes since draft-ietf-mptcp-multiaddressed-01 . . . . . 63
     D.6.  Changes since draft-ietf-mptcp-multiaddressed-00 . . . . . 64
     D.7.  Changes since draft-ford-mptcp-multiaddressed-03 . . . . . 64
     D.8.  Changes since draft-ford-mptcp-multiaddressed-02 . . . . . 64
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 64















Ford, et al.             Expires April 22, 2013                 [Page 3]


Internet-Draft                Multipath TCP                 October 2012


1.  Introduction

   MPTCP is a set of extensions to regular TCP [1] to provide a
   Multipath TCP [2] service, which enables a transport connection to
   operate across multiple paths simultaneously.  This document presents
   the protocol changes required to add multipath capability to TCP;
   specifically, those for signaling and setting up multiple paths
   ("subflows"), managing these subflows, reassembly of data, and
   termination of sessions.  This is not the only information required
   to create a Multipath TCP implementation, however.  This document is
   complemented by three others:

   o  Architecture [2], which explains the motivations behind Multipath
      TCP, contains a discussion of high-level design decisions on which
      this design is based, and an explanation of a functional
      separation through which an extensible MPTCP implementation can be
      developed.

   o  Congestion Control [5], presenting a safe congestion control
      algorithm for coupling the behaviour of the multiple paths in
      order to "do no harm" to other network users.

   o  Application Considerations [6], discussing what impact MPTCP will
      have on applications, what applications will want to do with
      MPTCP, and as a consequence of these factors, what API extensions
      an MPTCP implementation should present.

1.1.  Design Assumptions

   In order to limit the potentially huge design space, the working
   group imposed two key constraints on the multipath TCP design
   presented in this document:

   o  It must be backwards-compatible with current, regular TCP, to
      increase its chances of deployment

   o  It can be assumed that one or both hosts are multihomed and
      multiaddressed

   To simplify the design we assume that the presence of multiple
   addresses at a host is sufficient to indicate the existence of
   multiple paths.  These paths need not be entirely disjoint: they may
   share one or many routers between them.  Even in such a situation
   making use of multiple paths is beneficial, improving resource
   utilisation and resilience to a subset of node failures.  The
   congestion control algorithms defined in [5] ensure this does not act
   detrimentally.  Furthermore, there may be some scenarios where
   different TCP ports on a single host can provide disjoint paths (such



Ford, et al.             Expires April 22, 2013                 [Page 4]


Internet-Draft                Multipath TCP                 October 2012


   as through certain ECMP implementations [7]), and so the MPTCP design
   also supports the use of ports in path identifiers.

   There are three aspects to the backwards-compatibility listed above
   (discussed in more detail in [2]):

   External Constraints:  The protocol must function through the vast
      majority of existing middleboxes such as NATs, firewalls and
      proxies, and as such must resemble existing TCP as far as possible
      on the wire.  Furthermore, the protocol must not assume the
      segments it sends on the wire arrive unmodified at the
      destination: they may be split or coalesced; TCP options may be
      removed or duplicated.

   Application Constraints:  The protocol must be usable with no change
      to existing applications that use the common TCP API (although it
      is reasonable that not all features would be available to such
      legacy applications).  Furthermore, the protocol must provide the
      same service model as regular TCP to the application.

   Fall-back:  The protocol should be able to fall back to standard TCP
      with no interference from the user, to be able to communicate with
      legacy hosts.

   The complementary application considerations document [6] discusses
   the necessary features of an API to provide backwards-compatibility,
   as well as API extensions to convey the behaviour of MPTCP at a level
   of control and information equivalent to that available with regular,
   single-path TCP.

   Further discussion of the design constraints and associated design
   decisions are given in the MPTCP Architecture document [2].

1.2.  Multipath TCP in the Networking Stack

   MPTCP operates at the transport layer and aims to be transparent to
   both higher and lower layers.  It is a set of additional features on
   top of standard TCP; Figure 1 illustrates this layering.  MPTCP is
   designed to be usable by legacy applications with no changes;
   detailed discussion of its interactions with applications is given in
   [6].










Ford, et al.             Expires April 22, 2013                 [Page 5]


Internet-Draft                Multipath TCP                 October 2012


                                   +-------------------------------+
                                   |           Application         |
      +---------------+            +-------------------------------+
      |  Application  |            |             MPTCP             |
      +---------------+            + - - - - - - - + - - - - - - - +
      |      TCP      |            | Subflow (TCP) | Subflow (TCP) |
      +---------------+            +-------------------------------+
      |      IP       |            |       IP      |      IP       |
      +---------------+            +-------------------------------+

      Figure 1: Comparison of Standard TCP and MPTCP Protocol Stacks

1.3.  Terminology

   This document makes use of a number of terms which are either MPTCP-
   specific, or have defined meaning in the context of MPTCP, as
   follows:

   Path:  A sequence of links between a sender and a receiver, defined
      in this context by a 4-tuple of source and destination address/
      port pairs.

   Subflow:  A flow of TCP segments operating over an individual path,
      which forms part of a larger MPTCP connection.  A subflow is
      started and terminated similarly to a regular TCP connection.

   (MPTCP) Connection:  A set of one or more subflows, over which an
      application can communicate between two hosts.  There is a one-to-
      one mapping between a connection and an application socket.

   Data-level:  The payload data is nominally transferred over a
      connection, which in turn is transported over subflows.  Thus the
      term "data-level" is synonymous with "connection level", in
      contrast to "subflow-level" which refers to properties of an
      individual subflow.

   Token:  A locally unique identifier given to a multipath connection
      by a host.  May also be referred to as a "Connection ID".

   Host:  A end host operating an MPTCP implementation, and either
      initiating or accepting an MPTCP connection.

   In addition to these terms, note that MPTCP's interpretation of, and
   effect on, regular single-path TCP semantics are discussed in
   Section 4.






Ford, et al.             Expires April 22, 2013                 [Page 6]


Internet-Draft                Multipath TCP                 October 2012


1.4.  MPTCP Concept

   This section provides a high-level summary of normal operation of
   MPTCP, and is illustrated by the scenario shown in Figure 2.  A
   detailed description of operation is given in Section 3.

   o  To a non-MPTCP-aware application, MPTCP will behave the same as
      normal TCP.  Extended APIs could provide additional control to
      MPTCP-aware applications [6].  An application begins by opening a
      TCP socket in the normal way.  MPTCP signaling and operation is
      handled by the MPTCP implementation.

   o  An MPTCP connection begins similarly to a regular TCP connection.
      This is illustrated in Figure 2 where an MPTCP connection is
      established between addresses A1 and B1 on Hosts A and B
      respectively.

   o  If extra paths are available, additional TCP sessions (termed
      MPTCP "subflows") are created on these paths, and are combined
      with the existing session, which continues to appear as a single
      connection to the applications at both ends.  The creation of the
      additional TCP session is illustrated between Address A2 on Host A
      and Address B1 on Host B.

   o  MPTCP identifies multiple paths by the presence of multiple
      addresses at hosts.  Combinations of these multiple addresses
      equate to the additional paths.  In the example, other potential
      paths that could be set up are A1<->B2 and A2<->B2.  Although this
      additional session is shown as being initiated from A2, it could
      equally have been initiated from B1.

   o  The discovery and setup of additional subflows will be achieved
      through a path management method; this document describes a
      mechanism by which a host can initiate new subflows by using its
      own additional addresses, or by signaling its available addresses
      to the other host.

   o  MPTCP adds connection-level sequence numbers to allow the
      reassembly of segments arriving on multiple subflows with
      differing network delays.

   o  Subflows are terminated as regular TCP connections, with a four
      way FIN handshake.  The MPTCP connection is terminated by a
      connection-level FIN.







Ford, et al.             Expires April 22, 2013                 [Page 7]


Internet-Draft                Multipath TCP                 October 2012


               Host A                               Host B
      ------------------------             ------------------------
      Address A1    Address A2             Address B1    Address B2
      ----------    ----------             ----------    ----------
          |             |                      |             |
          |     (initial connection setup)     |             |
          |----------------------------------->|             |
          |<-----------------------------------|             |
          |             |                      |             |
          |            (additional subflow setup)            |
          |             |--------------------->|             |
          |             |<---------------------|             |
          |             |                      |             |
          |             |                      |             |

                  Figure 2: Example MPTCP Usage Scenario

1.5.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [3].


2.  Operation Overview

   This section presents a single description of common MPTCP operation,
   with reference to the protocol operation.  This is a high-level
   overview of the key functions; the full specification follows in
   Section 3.  Extensibility and negotiated features are not discussed
   here.  Considerable reference is made to symbolic names of MPTCP
   options throughout this section - these are subtypes of the IANA-
   assigned MPTCP option (see Section 8), and their formats are defined
   in the detailed protocol specification which follows in Section 3.

   A Multipath TCP connection provides a bidirectionnal bytestream
   between two hosts communicating like normal TCP and thus does not
   require any change to the applications.  However, Multipath TCP
   enables the hosts to use different paths with different IP addresses
   to exchange packets belonging to the MPTCP connection.  A Multipath
   TCP connection appears like a normal TCP connection to an
   application.  However, to the network layer each MPTCP subflows looks
   like a regular TCP flow whose segments carry a new TCP option type.
   Multipath TCP manages the creation, removal and utilization of these
   subflows to send data.  The number of subflows that are managed
   within a Multipath TCP connection is not fixed and it can fluctuate
   during the lifetime of the Multipath TCP connection.




Ford, et al.             Expires April 22, 2013                 [Page 8]


Internet-Draft                Multipath TCP                 October 2012


   All MPTCP operations are signaled with a TCP option - a single
   numerical type for MPTCP, with "sub-types" for each MPTCP message.
   What follows is a summary of the purpose and rationale of these
   messages.

2.1.  Initiating an MPTCP connection

   This is the same signaling as for initiating a normal TCP connection,
   but the SYN, SYN/ACK and ACK packets also carry the MP_CAPABLE
   option.  This is variable-length and serves multiple purposes.
   Firstly, it verifies whether the remote host supports Multipath TCP;
   and secondly, this option allows the hosts to exchange some
   information to authenticate the establishment of additional subflows.
   Further details are given in Section 3.1.

      Host-A                                  Host-B
      ------                                  ------
      MP_CAPABLE            ->
      [A's key, flags]
                            <-                MP_CAPABLE
                                              [B's key, flags]
      ACK + MP_CAPABLE      ->
      [A's key, B's key, flags]

2.2.  Associating a new subflow with an existing MPTCP connection

   The exchange of keys in the MP_CAPABLE handshake provides material
   that can be used to authenticate the endpoints when new subflows will
   be setup.  Additional subflows begin in the same way as initiating a
   normal TCP connection, but the SYN, SYN/ACK and ACK packets also
   carry the MP_JOIN option.

   Host-A initiates a new subflow between one of its addresses and one
   of Host-B's addresses.  The token - generated from the key - is used
   to identify which MPTCP connection it is joining, and the HMAC is
   used for authentication.  The HMAC uses the keys exchanged in the
   MP_CAPABLE handshake, and the random numbers (nonces) exchanged in
   these MP_JOIN options.  MP_JOIN also contains flags and an Address ID
   that can be used to refer to the source address without the sender
   needing to know if it has been changed by a NAT.  Further details in
   Section 3.2.










Ford, et al.             Expires April 22, 2013                 [Page 9]


Internet-Draft                Multipath TCP                 October 2012


      Host-A                                  Host-B
      ------                                  ------
      MP_JOIN               ->
      [B's token, A's nonce,
       A's Address ID, flags]
                            <-                MP_JOIN
                                              [B's HMAC, B's nonce,
                                               B's Address ID, flags]
      ACK + MP_JOIN         ->
      [A's HMAC]

                            <-                ACK

2.3.  Informing the other Host about another potential address

   The set of IP addresses associated to a multihomed host may change
   during the lifetime of an MPTCP connection.  MPTCP supports the
   addition and removal of addresses on a host both implicitly and
   explicitly.  If Host-A has established a subflow starting at address
   IP#-A1 and wants to open a second subflow starting at address IP#-A2,
   it simply initiates the establishment of the subflow as explained
   above.  The remote host will then be implicitly informed about the
   new address.

   In some circumstances, a host may want to advertise to the remote
   host the availability of an address without establishing a new
   subflow, for example when a NAT prevents setup in one direction.  In
   the example below, Host-A informs Host-B about its alternative IP
   address (IP#-A2).  Host-B may later send an MP_JOIN to this new
   address.  Due to the presence of middleboxes that may translate IP
   addresses, this option uses an address identifier to unambiguously
   identify an address on a host.  Further details in Section 3.4.1.

      Host-A                                 Host-B
      ------                                 ------
      ADD_ADDR                  ->
      [IP#-A2,
       IP#-A2's Address ID]

   There is a corresponding signal for address removal, making use of
   the Address ID that is signalled in the add address handshake.
   Further details in Section 3.4.2.

      Host-A                                 Host-B
      ------                                 ------
      REMOVE_ADDR               ->
      [IP#-A2's Address ID]




Ford, et al.             Expires April 22, 2013                [Page 10]


Internet-Draft                Multipath TCP                 October 2012


2.4.  Data transfer using MPTCP

   To ensure reliable, in-order delivery of data over subflows that may
   appear and disappear at any time, MPTCP uses a 64-bit Data Sequence
   Number (DSN) to number all data sent over the MPTCP connection.  Each
   subflow has its own 32 bits sequence number space and an MPTCP option
   maps the subflow sequence space to the data sequence space.  In this
   way, data can be retransmitted on different subflows (mapped to the
   same DSN) in the event of failure.

   The "Data Sequence Signal" carries the "Data Sequence Mapping".  The
   Data Sequence Mapping consists of the subflow sequence number, data
   sequence number, and length for which this mapping is valid.  This
   option can also carry a connection-level acknowledgement (the "Data
   ACK") for the received DSN.

   With MPTCP, all subflows share the same receive buffer and advertise
   the same receive window.  There are two levels of acknowledgement in
   MPTCP.  Regular TCP acknowledgments are used on each subflow to
   acknowledge the reception of the segments sent over the subflow
   independently of their DSN.  In addition, there are connection-level
   acknowledgments for the data sequence space.  These acknowledgments
   track the advancement of the bytestream and slide the receiving
   window.

   Further details are in Section 3.3.

      Host-A                                 Host-B
      ------                                 ------
      DATA_SEQUENCE_SIGNAL      ->
      [Data Sequence Mapping]
      [Data ACK]
      [Checksum]

2.5.  Requesting a change in a path's priority

   Hosts can indicate at initial subflow setup whether they wish the
   subflow to be used as a regular or backup path - a backup path being
   only used if there are no regular paths available.  During a
   connection, Host-A can request a change in the priority of a subflow
   through the MP_PRIO signal to Host-B.  Further details in
   Section 3.3.8.

      Host-A                                 Host-B
      ------                                 ------
      MP_PRIO                   ->





Ford, et al.             Expires April 22, 2013                [Page 11]


Internet-Draft                Multipath TCP                 October 2012


2.6.  Closing an MPTCP connection

   When Host-A wants to inform Host-B that it has no more data to send,
   it signals this "Data FIN" as part of the Data Sequence Signal (see
   above).  It has the same semantics and behaviour as a regular TCP
   FIN, but at the connection level.  Once all the data on the MPTCP
   connection has been successfully received, then this message is
   acknowledged at the connection level with a DATA_ACK.  Further
   details in Section 3.3.3.

      Host-A                                 Host-B
      ------                                 ------
      DATA_SEQUENCE_SIGNAL      ->
      [Data FIN]

                                <-           (MPTCP DATA_ACK)

2.7.  Notable features

   It is worth highlighting that MPTCP's signaling has been designed
   with several key requirements in mind:

   o  To cope with NATs on the path, addresses are referred to by
      Address IDs, in case the IP packet's source address gets changed
      by a NAT.  Setting up a new TCP flow is not possible if the
      passive opener is behind a NAT; to allow subflows to be created
      when either end is behind a NAT, MPTCP uses the ADD_ADDR message.

   o  MPTCP falls back to ordinary TCP if MPTCP operation is not
      possible.  For example if one host is not MPTCP capable, or if a
      middlebox alters the payload.

   o  To meet the threats identified in [8], the following steps are
      taken: keys are sent in the clear in the MP_CAPABLE messages;
      MP_JOIN messages are secured with HMAC-SHA1 ([9], [4]) using those
      keys; and standard TCP validity checks are made on the other
      messages (ensuring sequence numbers are in-window).


3.  MPTCP Protocol

   This section describes the operation of the MPTCP protocol, and is
   subdivided into sections for each key part of the protocol operation.

   All MPTCP operations are signalled using optional TCP header fields.
   A single TCP option number ("Kind") will be assigned by IANA for
   MPTCP (see Section 8), and then individual messages will be
   determined by a "sub-type", the values of which will also be stored



Ford, et al.             Expires April 22, 2013                [Page 12]


Internet-Draft                Multipath TCP                 October 2012


   in an IANA registry (and are also listed in Section 8).

   Throughout this document, when reference is made to an MPTCP option
   by symbolic name, such as "MP_CAPABLE", this refers to a TCP option
   with the single MPTCP option type, and with the sub-type value of the
   symbolic name as defined in Section 8.  This sub-type is a four-bit
   field - the first four bits of the option payload, as shown in
   Figure 3.  The MPTCP messages are defined in the following sections.

                           1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +---------------+---------------+-------+-----------------------+
      |     Kind      |    Length     |Subtype|                       |
      +---------------+---------------+-------+                       |
      |                     Subtype-specific data                     |
      |                       (variable length)                       |
      +---------------------------------------------------------------+

                       Figure 3: MPTCP option format

   Those MPTCP options associated with subflow initiation are used on
   packets with the SYN flag set.  Additionally, there is one MPTCP
   option for signaling metadata to ensure segmented data can be
   recombined for delivery to the application.

   The remaining options, however, are signals that do not need to be on
   a specific packet, such as those for signaling additional addresses.
   Whilst an implementation may desire to send MPTCP options as soon as
   possible, it may not be possible to combine all desired options (both
   those for MPTCP and for regular TCP, such as SACK [10]) on a single
   packet.  Therefore, an implementation may choose to send duplicate
   ACKs containing the additional signaling information.  This changes
   the semantics of a duplicate ACK, these are usually only sent as a
   signal of a lost segment [11] in regular TCP.  Therefore, an MPTCP
   implementation receiving a duplicate ACK which contains an MPTCP
   option MUST NOT treat it as a signal of congestion.  Additionally, an
   MPTCP implementation SHOULD NOT send more than two duplicate ACKs in
   a row for the purposes of sending MPTCP options alone, in order to
   ensure no middleboxes misinterpret this as a sign of congestion.

   Furthermore, standard TCP validity checks (such as ensuring the
   Sequence Number and Acknowledgement Number are within window) MUST be
   undertaken before processing any MPTCP signals, as described in [12].

3.1.  Connection Initiation

   Connection Initiation begins with a SYN, SYN/ACK, ACK exchange on a
   single path.  Each packet contains the Multipath Capable (MP_CAPABLE)



Ford, et al.             Expires April 22, 2013                [Page 13]


Internet-Draft                Multipath TCP                 October 2012


   TCP option (Figure 4).  This option declares its sender is capable of
   performing multipath TCP and wishes to do so on this particular
   connection.

   This option is used to declare the sender's 64 bit key, which is
   uniquely linked to this MPTCP connection.  This key is used to
   authenticate the addition of future subflows to this connection.
   This is the only time the key will be sent in clear on the wire
   (unless "fast close", Section 3.5, is used); all future subflows will
   identify the connection using a 32 bit "token".  This token is a
   cryptographic hash of this key.  The algorithm for this process is
   dependent on the authentication algorithm selected; the method of
   selection is defined later in this section.

   This key is generated by its sender, and its method of generation is
   implementation-specific.  The key MUST be hard to guess, and it MUST
   be unique for the sending host at any one time.  Recommendations for
   generating random numbers for use in keys are given in [13].
   Connections will be indexed at each host by the token (a one-way hash
   of the key).  Therefore, an implementation will require a mapping
   from each token to the corresponding connection, and in turn to the
   keys for the connection.

   There is a very small risk that two different keys will hash to the
   same token.  An implementation SHOULD check its list of connection
   tokens to ensure there is not a collision before sending its key in
   the SYN/ACK.  This would, however, be costly for a server with
   thousands of connections.  The subflow handshake mechanism
   (Section 3.2) will ensure that new subflows only join the correct
   connection, however, by checking tokens in both directions, and
   ensuring sequence numbers are in-window, so in the worst case if
   there was a token collision, the new subflow would be closed, but the
   MPTCP connection would continue to provide a regular TCP service.

   The MP_CAPABLE option is carried on the SYN, SYN/ACK, and ACK packets
   that start the first subflow of an MPTCP connection.  The data
   carried by each packet is as follows, where A = initiator and B =
   listener.

   o  SYN (A->B): A's Key.

   o  SYN/ACK (B->A): B's Key.

   o  ACK (A->B): A's Key followed by B's Key.

   The contents of the option is determined by the SYN and ACK flags of
   the packet, verified by the option's length field.  For the diagram
   shown in Figure 4, "sender" and "receiver" refer to the sender or



Ford, et al.             Expires April 22, 2013                [Page 14]


Internet-Draft                Multipath TCP                 October 2012


   receiver of the TCP packet (which can be either host).  If the SYN
   flag is set, a single key is included; if only an ACK flag is set,
   both keys are present.

   B's Key is echoed in the ACK in order to allow the listener (host B)
   to act statelessly until the TCP connection reaches the ESTABLISHED
   state.  If the listener acts in this way, however, it MUST generate
   its key in a way that would allow it to verify that it generated the
   key when it is echoed in the ACK.

   This exchange allows the safe passage of MPTCP options on SYN packets
   to be determined.  If any of these options are dropped, MPTCP will
   gracefully fall back to regular single-path TCP, as documented in
   Section 3.6.  Note that new subflows MUST NOT be established (using
   the process documented in Section 3.2) until a DSS option has been
   successfully received across the path (as documented in Section 3.3).

                           1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +---------------+---------------+-------+-------+---------------+
      |     Kind      |    Length     |Subtype|Version|A|B|C|D|E|F|G|H|
      +---------------+---------------+-------+-------+---------------+
      |                   Option Sender's Key (64 bits)               |
      |                                                               |
      |                                                               |
      +---------------------------------------------------------------+
      |                  Option Receiver's Key (64 bits)              |
      |                     (if option Length == 20)                  |
      |                                                               |
      +---------------------------------------------------------------+


              Figure 4: Multipath Capable (MP_CAPABLE) option

   The first four bits of the first octet in the MP_CAPABLE option
   (Figure 4) define the MPTCP option subtype (see Section 8; for
   MP_CAPABLE, this is 0), and the remaining four bits of this octet
   specifies the MPTCP version in use (for this specification, this is
   0).

   The second octet is reserved for flags, allocated as follows:

   A: The leftmost bit, labelled "A", SHOULD be set to 1 to indicate
      "Checksum Required", unless the system administrator has decided
      that checksums are not required (for example, if the environment
      is controlled and no middleboxes exist that might adjust the
      payload).




Ford, et al.             Expires April 22, 2013                [Page 15]


Internet-Draft                Multipath TCP                 October 2012


   B: The second bit, labelled "B", is an extensibility flag, and MUST
      be set to 0 for current implementations.  This will be used for an
      extensibility mechanism in a future specification, and the impact
      of this flag will be defined at a later date.  If receiving a
      message with the "B" flag set to 1, and this is not understood,
      then this SYN MUST be silently ignored; the sender is expected to
      retry with a format compatible with this legacy specification.
      Note that the length of the MP_CAPABLE option, and the meanings of
      bits "C" through "H", may be altered by setting B=1.

   C through H:  The remaining bits, labelled "C" through "H", are used
      for crypto algorithm negotiation.  Currently only the rightmost
      bit, labelled "H", is assigned.  Bit "H" indicates the use of
      HMAC-SHA1 (as defined in Section 3.2).  An implementation that
      only supports this method MUST set bit "H" to 1, and bits "C"
      through "G" to 0.

   A crypto algorithm MUST be specified.  If flag bits C through H are
   all 0, the MP_CAPABLE option MUST be treated as invalid and ignored
   (that is, it must be treated as a regular TCP handshake).

   The selection of the authentication algorithm also impacts the
   algorithm used to generate the token and the Initial Data Sequence
   Number.  In this specification, with only the SHA-1 algorithm (bit
   "H") specified and selected, the token MUST be a truncated (most
   significant 32 bits) SHA-1 hash ([4], [14]) of the key.  A different,
   64 bit truncation (the least significant 64 bits) of the SHA-1 hash
   of the key MUST be used as the Initial Data Sequence Number.  Note
   that the key MUST be hashed in network byte order.  Also note that
   the "least significant" bits MUST be the rightmost bits of the SHA-1
   digest, as per [4].  Future specifications of the use of the crypto
   bits may choose to specify different algorithms for token and IDSN
   generation.

   Both the crypto and checksum bits negotiate capabilities in similar
   ways.  For the Checksum Required bit (labelled "A"), if either host
   requires the use of checksums, checksums MUST be used.  In other
   words, the only way for checksums not to be used is if both hosts in
   their SYNs set A=0.  This decision is confirmed by the setting of the
   "A" bit in the third packet (the ACK) of the handshake.  For example,
   if the initiator sets A=0 in the SYN, but the responder sets A=1 in
   the SYN/ACK, checksums MUST be used in both directions, and the
   initiator will set A=1 in the ACK.  The decision whether to use
   checksums will be stored by an implementation in a per-connection
   binary state variable.

   For crypto negotiation, the responder has the choice.  The initiator
   creates a proposal setting a bit for each algorithm it supports to 1



Ford, et al.             Expires April 22, 2013                [Page 16]


Internet-Draft                Multipath TCP                 October 2012


   (in this version of the specification, there is only one proposal, so
   bit "H" will be always set to 1).  The responder responds with only
   one bit set - this is the chosen algorithm.  The rationale for this
   behaviour is that the responder will typically be a server with
   potentially many thousands of connections, so it may wish to choose
   an algorithm with minimal computational complexity, depending on the
   load.  If a responder does not support (or does not want to support)
   any of the initiator's proposals, it can respond without an
   MP_CAPABLE option, thus forcing a fall-back to regular TCP.

   The MP_CAPABLE option is only used in the first subflow of a
   connection, in order to identify the connection; all following
   subflows will use the "Join" option (see