Internet Engineering Task Force A. Ford
Internet-Draft Cisco
Intended status: Experimental C. Raiciu
Expires: April 22, 2013 University Politehnica of
Bucharest
M. Handley
University College London
O. Bonaventure
Universite catholique de
Louvain
October 19, 2012
TCP Extensions for Multipath Operation with Multiple Addresses
draft-ietf-mptcp-multiaddressed-11
Abstract
TCP/IP communication is currently restricted to a single path per
connection, yet multiple paths often exist between peers. The
simultaneous use of these multiple paths for a TCP/IP session would
improve resource usage within the network, and thus improve user
experience through higher throughput and improved resilience to
network failure.
Multipath TCP provides the ability to simultaneously use multiple
paths between peers. This document presents a set of extensions to
traditional TCP to support multipath operation. The protocol offers
the same type of service to applications as TCP (i.e. reliable
bytestream), and provides the components necessary to establish and
use multiple TCP flows across potentially disjoint paths.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 22, 2013.
Ford, et al. Expires April 22, 2013 [Page 1]
Internet-Draft Multipath TCP October 2012
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Design Assumptions . . . . . . . . . . . . . . . . . . . . 4
1.2. Multipath TCP in the Networking Stack . . . . . . . . . . 5
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6
1.4. MPTCP Concept . . . . . . . . . . . . . . . . . . . . . . 7
1.5. Requirements Language . . . . . . . . . . . . . . . . . . 8
2. Operation Overview . . . . . . . . . . . . . . . . . . . . . . 8
2.1. Initiating an MPTCP connection . . . . . . . . . . . . . . 9
2.2. Associating a new subflow with an existing MPTCP
connection . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3. Informing the other Host about another potential
address . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4. Data transfer using MPTCP . . . . . . . . . . . . . . . . 11
2.5. Requesting a change in a path's priority . . . . . . . . . 11
2.6. Closing an MPTCP connection . . . . . . . . . . . . . . . 12
2.7. Notable features . . . . . . . . . . . . . . . . . . . . . 12
3. MPTCP Protocol . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1. Connection Initiation . . . . . . . . . . . . . . . . . . 13
3.2. Starting a New Subflow . . . . . . . . . . . . . . . . . . 18
3.3. General MPTCP Operation . . . . . . . . . . . . . . . . . 23
3.3.1. Data Sequence Mapping . . . . . . . . . . . . . . . . 25
3.3.2. Data Acknowledgments . . . . . . . . . . . . . . . . . 28
3.3.3. Closing a Connection . . . . . . . . . . . . . . . . . 29
3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 30
3.3.5. Sender Considerations . . . . . . . . . . . . . . . . 31
3.3.6. Reliability and Retransmissions . . . . . . . . . . . 32
3.3.7. Congestion Control Considerations . . . . . . . . . . 33
3.3.8. Subflow Policy . . . . . . . . . . . . . . . . . . . . 34
3.4. Address Knowledge Exchange (Path Management) . . . . . . . 35
3.4.1. Address Advertisement . . . . . . . . . . . . . . . . 36
Ford, et al. Expires April 22, 2013 [Page 2]
Internet-Draft Multipath TCP October 2012
3.4.2. Remove Address . . . . . . . . . . . . . . . . . . . . 39
3.5. Fast Close . . . . . . . . . . . . . . . . . . . . . . . . 40
3.6. Fallback . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.7. Error Handling . . . . . . . . . . . . . . . . . . . . . . 44
3.8. Heuristics . . . . . . . . . . . . . . . . . . . . . . . . 45
3.8.1. Port Usage . . . . . . . . . . . . . . . . . . . . . . 45
3.8.2. Delayed Subflow Start . . . . . . . . . . . . . . . . 45
3.8.3. Failure Handling . . . . . . . . . . . . . . . . . . . 46
4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 47
5. Security Considerations . . . . . . . . . . . . . . . . . . . 48
6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 51
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 54
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 54
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 56
9.1. Normative References . . . . . . . . . . . . . . . . . . . 56
9.2. Informative References . . . . . . . . . . . . . . . . . . 56
Appendix A. Notes on use of TCP Options . . . . . . . . . . . . . 58
Appendix B. Control Blocks . . . . . . . . . . . . . . . . . . . 60
B.1. MPTCP Control Block . . . . . . . . . . . . . . . . . . . 60
B.1.1. Authentication and Metadata . . . . . . . . . . . . . 60
B.1.2. Sending Side . . . . . . . . . . . . . . . . . . . . . 60
B.1.3. Receiving Side . . . . . . . . . . . . . . . . . . . . 61
B.2. TCP Control Blocks . . . . . . . . . . . . . . . . . . . . 61
B.2.1. Sending Side . . . . . . . . . . . . . . . . . . . . . 61
B.2.2. Receiving Side . . . . . . . . . . . . . . . . . . . . 61
Appendix C. Finite State Machine . . . . . . . . . . . . . . . . 62
Appendix D. Changelog . . . . . . . . . . . . . . . . . . . . . . 62
D.1. Changes since draft-ietf-mptcp-multiaddressed-05 . . . . . 63
D.2. Changes since draft-ietf-mptcp-multiaddressed-04 . . . . . 63
D.3. Changes since draft-ietf-mptcp-multiaddressed-03 . . . . . 63
D.4. Changes since draft-ietf-mptcp-multiaddressed-02 . . . . . 63
D.5. Changes since draft-ietf-mptcp-multiaddressed-01 . . . . . 63
D.6. Changes since draft-ietf-mptcp-multiaddressed-00 . . . . . 64
D.7. Changes since draft-ford-mptcp-multiaddressed-03 . . . . . 64
D.8. Changes since draft-ford-mptcp-multiaddressed-02 . . . . . 64
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 64
Ford, et al. Expires April 22, 2013 [Page 3]
Internet-Draft Multipath TCP October 2012
1. Introduction
MPTCP is a set of extensions to regular TCP [1] to provide a
Multipath TCP [2] service, which enables a transport connection to
operate across multiple paths simultaneously. This document presents
the protocol changes required to add multipath capability to TCP;
specifically, those for signaling and setting up multiple paths
("subflows"), managing these subflows, reassembly of data, and
termination of sessions. This is not the only information required
to create a Multipath TCP implementation, however. This document is
complemented by three others:
o Architecture [2], which explains the motivations behind Multipath
TCP, contains a discussion of high-level design decisions on which
this design is based, and an explanation of a functional
separation through which an extensible MPTCP implementation can be
developed.
o Congestion Control [5], presenting a safe congestion control
algorithm for coupling the behaviour of the multiple paths in
order to "do no harm" to other network users.
o Application Considerations [6], discussing what impact MPTCP will
have on applications, what applications will want to do with
MPTCP, and as a consequence of these factors, what API extensions
an MPTCP implementation should present.
1.1. Design Assumptions
In order to limit the potentially huge design space, the working
group imposed two key constraints on the multipath TCP design
presented in this document:
o It must be backwards-compatible with current, regular TCP, to
increase its chances of deployment
o It can be assumed that one or both hosts are multihomed and
multiaddressed
To simplify the design we assume that the presence of multiple
addresses at a host is sufficient to indicate the existence of
multiple paths. These paths need not be entirely disjoint: they may
share one or many routers between them. Even in such a situation
making use of multiple paths is beneficial, improving resource
utilisation and resilience to a subset of node failures. The
congestion control algorithms defined in [5] ensure this does not act
detrimentally. Furthermore, there may be some scenarios where
different TCP ports on a single host can provide disjoint paths (such
Ford, et al. Expires April 22, 2013 [Page 4]
Internet-Draft Multipath TCP October 2012
as through certain ECMP implementations [7]), and so the MPTCP design
also supports the use of ports in path identifiers.
There are three aspects to the backwards-compatibility listed above
(discussed in more detail in [2]):
External Constraints: The protocol must function through the vast
majority of existing middleboxes such as NATs, firewalls and
proxies, and as such must resemble existing TCP as far as possible
on the wire. Furthermore, the protocol must not assume the
segments it sends on the wire arrive unmodified at the
destination: they may be split or coalesced; TCP options may be
removed or duplicated.
Application Constraints: The protocol must be usable with no change
to existing applications that use the common TCP API (although it
is reasonable that not all features would be available to such
legacy applications). Furthermore, the protocol must provide the
same service model as regular TCP to the application.
Fall-back: The protocol should be able to fall back to standard TCP
with no interference from the user, to be able to communicate with
legacy hosts.
The complementary application considerations document [6] discusses
the necessary features of an API to provide backwards-compatibility,
as well as API extensions to convey the behaviour of MPTCP at a level
of control and information equivalent to that available with regular,
single-path TCP.
Further discussion of the design constraints and associated design
decisions are given in the MPTCP Architecture document [2].
1.2. Multipath TCP in the Networking Stack
MPTCP operates at the transport layer and aims to be transparent to
both higher and lower layers. It is a set of additional features on
top of standard TCP; Figure 1 illustrates this layering. MPTCP is
designed to be usable by legacy applications with no changes;
detailed discussion of its interactions with applications is given in
[6].
Ford, et al. Expires April 22, 2013 [Page 5]
Internet-Draft Multipath TCP October 2012
+-------------------------------+
| Application |
+---------------+ +-------------------------------+
| Application | | MPTCP |
+---------------+ + - - - - - - - + - - - - - - - +
| TCP | | Subflow (TCP) | Subflow (TCP) |
+---------------+ +-------------------------------+
| IP | | IP | IP |
+---------------+ +-------------------------------+
Figure 1: Comparison of Standard TCP and MPTCP Protocol Stacks
1.3. Terminology
This document makes use of a number of terms which are either MPTCP-
specific, or have defined meaning in the context of MPTCP, as
follows:
Path: A sequence of links between a sender and a receiver, defined
in this context by a 4-tuple of source and destination address/
port pairs.
Subflow: A flow of TCP segments operating over an individual path,
which forms part of a larger MPTCP connection. A subflow is
started and terminated similarly to a regular TCP connection.
(MPTCP) Connection: A set of one or more subflows, over which an
application can communicate between two hosts. There is a one-to-
one mapping between a connection and an application socket.
Data-level: The payload data is nominally transferred over a
connection, which in turn is transported over subflows. Thus the
term "data-level" is synonymous with "connection level", in
contrast to "subflow-level" which refers to properties of an
individual subflow.
Token: A locally unique identifier given to a multipath connection
by a host. May also be referred to as a "Connection ID".
Host: A end host operating an MPTCP implementation, and either
initiating or accepting an MPTCP connection.
In addition to these terms, note that MPTCP's interpretation of, and
effect on, regular single-path TCP semantics are discussed in
Section 4.
Ford, et al. Expires April 22, 2013 [Page 6]
Internet-Draft Multipath TCP October 2012
1.4. MPTCP Concept
This section provides a high-level summary of normal operation of
MPTCP, and is illustrated by the scenario shown in Figure 2. A
detailed description of operation is given in Section 3.
o To a non-MPTCP-aware application, MPTCP will behave the same as
normal TCP. Extended APIs could provide additional control to
MPTCP-aware applications [6]. An application begins by opening a
TCP socket in the normal way. MPTCP signaling and operation is
handled by the MPTCP implementation.
o An MPTCP connection begins similarly to a regular TCP connection.
This is illustrated in Figure 2 where an MPTCP connection is
established between addresses A1 and B1 on Hosts A and B
respectively.
o If extra paths are available, additional TCP sessions (termed
MPTCP "subflows") are created on these paths, and are combined
with the existing session, which continues to appear as a single
connection to the applications at both ends. The creation of the
additional TCP session is illustrated between Address A2 on Host A
and Address B1 on Host B.
o MPTCP identifies multiple paths by the presence of multiple
addresses at hosts. Combinations of these multiple addresses
equate to the additional paths. In the example, other potential
paths that could be set up are A1<->B2 and A2<->B2. Although this
additional session is shown as being initiated from A2, it could
equally have been initiated from B1.
o The discovery and setup of additional subflows will be achieved
through a path management method; this document describes a
mechanism by which a host can initiate new subflows by using its
own additional addresses, or by signaling its available addresses
to the other host.
o MPTCP adds connection-level sequence numbers to allow the
reassembly of segments arriving on multiple subflows with
differing network delays.
o Subflows are terminated as regular TCP connections, with a four
way FIN handshake. The MPTCP connection is terminated by a
connection-level FIN.
Ford, et al. Expires April 22, 2013 [Page 7]
Internet-Draft Multipath TCP October 2012
Host A Host B
------------------------ ------------------------
Address A1 Address A2 Address B1 Address B2
---------- ---------- ---------- ----------
| | | |
| (initial connection setup) | |
|----------------------------------->| |
|<-----------------------------------| |
| | | |
| (additional subflow setup) |
| |--------------------->| |
| |<---------------------| |
| | | |
| | | |
Figure 2: Example MPTCP Usage Scenario
1.5. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [3].
2. Operation Overview
This section presents a single description of common MPTCP operation,
with reference to the protocol operation. This is a high-level
overview of the key functions; the full specification follows in
Section 3. Extensibility and negotiated features are not discussed
here. Considerable reference is made to symbolic names of MPTCP
options throughout this section - these are subtypes of the IANA-
assigned MPTCP option (see Section 8), and their formats are defined
in the detailed protocol specification which follows in Section 3.
A Multipath TCP connection provides a bidirectionnal bytestream
between two hosts communicating like normal TCP and thus does not
require any change to the applications. However, Multipath TCP
enables the hosts to use different paths with different IP addresses
to exchange packets belonging to the MPTCP connection. A Multipath
TCP connection appears like a normal TCP connection to an
application. However, to the network layer each MPTCP subflows looks
like a regular TCP flow whose segments carry a new TCP option type.
Multipath TCP manages the creation, removal and utilization of these
subflows to send data. The number of subflows that are managed
within a Multipath TCP connection is not fixed and it can fluctuate
during the lifetime of the Multipath TCP connection.
Ford, et al. Expires April 22, 2013 [Page 8]
Internet-Draft Multipath TCP October 2012
All MPTCP operations are signaled with a TCP option - a single
numerical type for MPTCP, with "sub-types" for each MPTCP message.
What follows is a summary of the purpose and rationale of these
messages.
2.1. Initiating an MPTCP connection
This is the same signaling as for initiating a normal TCP connection,
but the SYN, SYN/ACK and ACK packets also carry the MP_CAPABLE
option. This is variable-length and serves multiple purposes.
Firstly, it verifies whether the remote host supports Multipath TCP;
and secondly, this option allows the hosts to exchange some
information to authenticate the establishment of additional subflows.
Further details are given in Section 3.1.
Host-A Host-B
------ ------
MP_CAPABLE ->
[A's key, flags]
<- MP_CAPABLE
[B's key, flags]
ACK + MP_CAPABLE ->
[A's key, B's key, flags]
2.2. Associating a new subflow with an existing MPTCP connection
The exchange of keys in the MP_CAPABLE handshake provides material
that can be used to authenticate the endpoints when new subflows will
be setup. Additional subflows begin in the same way as initiating a
normal TCP connection, but the SYN, SYN/ACK and ACK packets also
carry the MP_JOIN option.
Host-A initiates a new subflow between one of its addresses and one
of Host-B's addresses. The token - generated from the key - is used
to identify which MPTCP connection it is joining, and the HMAC is
used for authentication. The HMAC uses the keys exchanged in the
MP_CAPABLE handshake, and the random numbers (nonces) exchanged in
these MP_JOIN options. MP_JOIN also contains flags and an Address ID
that can be used to refer to the source address without the sender
needing to know if it has been changed by a NAT. Further details in
Section 3.2.
Ford, et al. Expires April 22, 2013 [Page 9]
Internet-Draft Multipath TCP October 2012
Host-A Host-B
------ ------
MP_JOIN ->
[B's token, A's nonce,
A's Address ID, flags]
<- MP_JOIN
[B's HMAC, B's nonce,
B's Address ID, flags]
ACK + MP_JOIN ->
[A's HMAC]
<- ACK
2.3. Informing the other Host about another potential address
The set of IP addresses associated to a multihomed host may change
during the lifetime of an MPTCP connection. MPTCP supports the
addition and removal of addresses on a host both implicitly and
explicitly. If Host-A has established a subflow starting at address
IP#-A1 and wants to open a second subflow starting at address IP#-A2,
it simply initiates the establishment of the subflow as explained
above. The remote host will then be implicitly informed about the
new address.
In some circumstances, a host may want to advertise to the remote
host the availability of an address without establishing a new
subflow, for example when a NAT prevents setup in one direction. In
the example below, Host-A informs Host-B about its alternative IP
address (IP#-A2). Host-B may later send an MP_JOIN to this new
address. Due to the presence of middleboxes that may translate IP
addresses, this option uses an address identifier to unambiguously
identify an address on a host. Further details in Section 3.4.1.
Host-A Host-B
------ ------
ADD_ADDR ->
[IP#-A2,
IP#-A2's Address ID]
There is a corresponding signal for address removal, making use of
the Address ID that is signalled in the add address handshake.
Further details in Section 3.4.2.
Host-A Host-B
------ ------
REMOVE_ADDR ->
[IP#-A2's Address ID]
Ford, et al. Expires April 22, 2013 [Page 10]
Internet-Draft Multipath TCP October 2012
2.4. Data transfer using MPTCP
To ensure reliable, in-order delivery of data over subflows that may
appear and disappear at any time, MPTCP uses a 64-bit Data Sequence
Number (DSN) to number all data sent over the MPTCP connection. Each
subflow has its own 32 bits sequence number space and an MPTCP option
maps the subflow sequence space to the data sequence space. In this
way, data can be retransmitted on different subflows (mapped to the
same DSN) in the event of failure.
The "Data Sequence Signal" carries the "Data Sequence Mapping". The
Data Sequence Mapping consists of the subflow sequence number, data
sequence number, and length for which this mapping is valid. This
option can also carry a connection-level acknowledgement (the "Data
ACK") for the received DSN.
With MPTCP, all subflows share the same receive buffer and advertise
the same receive window. There are two levels of acknowledgement in
MPTCP. Regular TCP acknowledgments are used on each subflow to
acknowledge the reception of the segments sent over the subflow
independently of their DSN. In addition, there are connection-level
acknowledgments for the data sequence space. These acknowledgments
track the advancement of the bytestream and slide the receiving
window.
Further details are in Section 3.3.
Host-A Host-B
------ ------
DATA_SEQUENCE_SIGNAL ->
[Data Sequence Mapping]
[Data ACK]
[Checksum]
2.5. Requesting a change in a path's priority
Hosts can indicate at initial subflow setup whether they wish the
subflow to be used as a regular or backup path - a backup path being
only used if there are no regular paths available. During a
connection, Host-A can request a change in the priority of a subflow
through the MP_PRIO signal to Host-B. Further details in
Section 3.3.8.
Host-A Host-B
------ ------
MP_PRIO ->
Ford, et al. Expires April 22, 2013 [Page 11]
Internet-Draft Multipath TCP October 2012
2.6. Closing an MPTCP connection
When Host-A wants to inform Host-B that it has no more data to send,
it signals this "Data FIN" as part of the Data Sequence Signal (see
above). It has the same semantics and behaviour as a regular TCP
FIN, but at the connection level. Once all the data on the MPTCP
connection has been successfully received, then this message is
acknowledged at the connection level with a DATA_ACK. Further
details in Section 3.3.3.
Host-A Host-B
------ ------
DATA_SEQUENCE_SIGNAL ->
[Data FIN]
<- (MPTCP DATA_ACK)
2.7. Notable features
It is worth highlighting that MPTCP's signaling has been designed
with several key requirements in mind:
o To cope with NATs on the path, addresses are referred to by
Address IDs, in case the IP packet's source address gets changed
by a NAT. Setting up a new TCP flow is not possible if the
passive opener is behind a NAT; to allow subflows to be created
when either end is behind a NAT, MPTCP uses the ADD_ADDR message.
o MPTCP falls back to ordinary TCP if MPTCP operation is not
possible. For example if one host is not MPTCP capable, or if a
middlebox alters the payload.
o To meet the threats identified in [8], the following steps are
taken: keys are sent in the clear in the MP_CAPABLE messages;
MP_JOIN messages are secured with HMAC-SHA1 ([9], [4]) using those
keys; and standard TCP validity checks are made on the other
messages (ensuring sequence numbers are in-window).
3. MPTCP Protocol
This section describes the operation of the MPTCP protocol, and is
subdivided into sections for each key part of the protocol operation.
All MPTCP operations are signalled using optional TCP header fields.
A single TCP option number ("Kind") will be assigned by IANA for
MPTCP (see Section 8), and then individual messages will be
determined by a "sub-type", the values of which will also be stored
Ford, et al. Expires April 22, 2013 [Page 12]
Internet-Draft Multipath TCP October 2012
in an IANA registry (and are also listed in Section 8).
Throughout this document, when reference is made to an MPTCP option
by symbolic name, such as "MP_CAPABLE", this refers to a TCP option
with the single MPTCP option type, and with the sub-type value of the
symbolic name as defined in Section 8. This sub-type is a four-bit
field - the first four bits of the option payload, as shown in
Figure 3. The MPTCP messages are defined in the following sections.
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-----------------------+
| Kind | Length |Subtype| |
+---------------+---------------+-------+ |
| Subtype-specific data |
| (variable length) |
+---------------------------------------------------------------+
Figure 3: MPTCP option format
Those MPTCP options associated with subflow initiation are used on
packets with the SYN flag set. Additionally, there is one MPTCP
option for signaling metadata to ensure segmented data can be
recombined for delivery to the application.
The remaining options, however, are signals that do not need to be on
a specific packet, such as those for signaling additional addresses.
Whilst an implementation may desire to send MPTCP options as soon as
possible, it may not be possible to combine all desired options (both
those for MPTCP and for regular TCP, such as SACK [10]) on a single
packet. Therefore, an implementation may choose to send duplicate
ACKs containing the additional signaling information. This changes
the semantics of a duplicate ACK, these are usually only sent as a
signal of a lost segment [11] in regular TCP. Therefore, an MPTCP
implementation receiving a duplicate ACK which contains an MPTCP
option MUST NOT treat it as a signal of congestion. Additionally, an
MPTCP implementation SHOULD NOT send more than two duplicate ACKs in
a row for the purposes of sending MPTCP options alone, in order to
ensure no middleboxes misinterpret this as a sign of congestion.
Furthermore, standard TCP validity checks (such as ensuring the
Sequence Number and Acknowledgement Number are within window) MUST be
undertaken before processing any MPTCP signals, as described in [12].
3.1. Connection Initiation
Connection Initiation begins with a SYN, SYN/ACK, ACK exchange on a
single path. Each packet contains the Multipath Capable (MP_CAPABLE)
Ford, et al. Expires April 22, 2013 [Page 13]
Internet-Draft Multipath TCP October 2012
TCP option (Figure 4). This option declares its sender is capable of
performing multipath TCP and wishes to do so on this particular
connection.
This option is used to declare the sender's 64 bit key, which is
uniquely linked to this MPTCP connection. This key is used to
authenticate the addition of future subflows to this connection.
This is the only time the key will be sent in clear on the wire
(unless "fast close", Section 3.5, is used); all future subflows will
identify the connection using a 32 bit "token". This token is a
cryptographic hash of this key. The algorithm for this process is
dependent on the authentication algorithm selected; the method of
selection is defined later in this section.
This key is generated by its sender, and its method of generation is
implementation-specific. The key MUST be hard to guess, and it MUST
be unique for the sending host at any one time. Recommendations for
generating random numbers for use in keys are given in [13].
Connections will be indexed at each host by the token (a one-way hash
of the key). Therefore, an implementation will require a mapping
from each token to the corresponding connection, and in turn to the
keys for the connection.
There is a very small risk that two different keys will hash to the
same token. An implementation SHOULD check its list of connection
tokens to ensure there is not a collision before sending its key in
the SYN/ACK. This would, however, be costly for a server with
thousands of connections. The subflow handshake mechanism
(Section 3.2) will ensure that new subflows only join the correct
connection, however, by checking tokens in both directions, and
ensuring sequence numbers are in-window, so in the worst case if
there was a token collision, the new subflow would be closed, but the
MPTCP connection would continue to provide a regular TCP service.
The MP_CAPABLE option is carried on the SYN, SYN/ACK, and ACK packets
that start the first subflow of an MPTCP connection. The data
carried by each packet is as follows, where A = initiator and B =
listener.
o SYN (A->B): A's Key.
o SYN/ACK (B->A): B's Key.
o ACK (A->B): A's Key followed by B's Key.
The contents of the option is determined by the SYN and ACK flags of
the packet, verified by the option's length field. For the diagram
shown in Figure 4, "sender" and "receiver" refer to the sender or
Ford, et al. Expires April 22, 2013 [Page 14]
Internet-Draft Multipath TCP October 2012
receiver of the TCP packet (which can be either host). If the SYN
flag is set, a single key is included; if only an ACK flag is set,
both keys are present.
B's Key is echoed in the ACK in order to allow the listener (host B)
to act statelessly until the TCP connection reaches the ESTABLISHED
state. If the listener acts in this way, however, it MUST generate
its key in a way that would allow it to verify that it generated the
key when it is echoed in the ACK.
This exchange allows the safe passage of MPTCP options on SYN packets
to be determined. If any of these options are dropped, MPTCP will
gracefully fall back to regular single-path TCP, as documented in
Section 3.6. Note that new subflows MUST NOT be established (using
the process documented in Section 3.2) until a DSS option has been
successfully received across the path (as documented in Section 3.3).
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+-------+---------------+
| Kind | Length |Subtype|Version|A|B|C|D|E|F|G|H|
+---------------+---------------+-------+-------+---------------+
| Option Sender's Key (64 bits) |
| |
| |
+---------------------------------------------------------------+
| Option Receiver's Key (64 bits) |
| (if option Length == 20) |
| |
+---------------------------------------------------------------+
Figure 4: Multipath Capable (MP_CAPABLE) option
The first four bits of the first octet in the MP_CAPABLE option
(Figure 4) define the MPTCP option subtype (see Section 8; for
MP_CAPABLE, this is 0), and the remaining four bits of this octet
specifies the MPTCP version in use (for this specification, this is
0).
The second octet is reserved for flags, allocated as follows:
A: The leftmost bit, labelled "A", SHOULD be set to 1 to indicate
"Checksum Required", unless the system administrator has decided
that checksums are not required (for example, if the environment
is controlled and no middleboxes exist that might adjust the
payload).
Ford, et al. Expires April 22, 2013 [Page 15]
Internet-Draft Multipath TCP October 2012
B: The second bit, labelled "B", is an extensibility flag, and MUST
be set to 0 for current implementations. This will be used for an
extensibility mechanism in a future specification, and the impact
of this flag will be defined at a later date. If receiving a
message with the "B" flag set to 1, and this is not understood,
then this SYN MUST be silently ignored; the sender is expected to
retry with a format compatible with this legacy specification.
Note that the length of the MP_CAPABLE option, and the meanings of
bits "C" through "H", may be altered by setting B=1.
C through H: The remaining bits, labelled "C" through "H", are used
for crypto algorithm negotiation. Currently only the rightmost
bit, labelled "H", is assigned. Bit "H" indicates the use of
HMAC-SHA1 (as defined in Section 3.2). An implementation that
only supports this method MUST set bit "H" to 1, and bits "C"
through "G" to 0.
A crypto algorithm MUST be specified. If flag bits C through H are
all 0, the MP_CAPABLE option MUST be treated as invalid and ignored
(that is, it must be treated as a regular TCP handshake).
The selection of the authentication algorithm also impacts the
algorithm used to generate the token and the Initial Data Sequence
Number. In this specification, with only the SHA-1 algorithm (bit
"H") specified and selected, the token MUST be a truncated (most
significant 32 bits) SHA-1 hash ([4], [14]) of the key. A different,
64 bit truncation (the least significant 64 bits) of the SHA-1 hash
of the key MUST be used as the Initial Data Sequence Number. Note
that the key MUST be hashed in network byte order. Also note that
the "least significant" bits MUST be the rightmost bits of the SHA-1
digest, as per [4]. Future specifications of the use of the crypto
bits may choose to specify different algorithms for token and IDSN
generation.
Both the crypto and checksum bits negotiate capabilities in similar
ways. For the Checksum Required bit (labelled "A"), if either host
requires the use of checksums, checksums MUST be used. In other
words, the only way for checksums not to be used is if both hosts in
their SYNs set A=0. This decision is confirmed by the setting of the
"A" bit in the third packet (the ACK) of the handshake. For example,
if the initiator sets A=0 in the SYN, but the responder sets A=1 in
the SYN/ACK, checksums MUST be used in both directions, and the
initiator will set A=1 in the ACK. The decision whether to use
checksums will be stored by an implementation in a per-connection
binary state variable.
For crypto negotiation, the responder has the choice. The initiator
creates a proposal setting a bit for each algorithm it supports to 1
Ford, et al. Expires April 22, 2013 [Page 16]
Internet-Draft Multipath TCP October 2012
(in this version of the specification, there is only one proposal, so
bit "H" will be always set to 1). The responder responds with only
one bit set - this is the chosen algorithm. The rationale for this
behaviour is that the responder will typically be a server with
potentially many thousands of connections, so it may wish to choose
an algorithm with minimal computational complexity, depending on the
load. If a responder does not support (or does not want to support)
any of the initiator's proposals, it can respond without an
MP_CAPABLE option, thus forcing a fall-back to regular TCP.
The MP_CAPABLE option is only used in the first subflow of a
connection, in order to identify the connection; all following
subflows will use the "Join" option (see