INTERNET-DRAFT                                        Paul Gauthier
   Expires: December 1999                          Inktomi Corporation
   Category: Standards Track                                Josh Cohen
   draft-ietf-wrec-wpad-00.txt                   Microsoft Corporation
                                                       Martin Dunsmuir
                                                    RealNetworks, Inc.
                                                       Charles Perkins
                                                Sun Microsystems, Inc.



                     Web Proxy Auto-Discovery Protocol

Status of This Memo

   This document is a submission by the WREC Working Group of the
   Internet Engineering Task Force (IETF).  Comments should be
   submitted to the wrec@cs.utk.edu mailing list.

   Distribution of this memo is unlimited.

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026. Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."

     The list of current Internet-Drafts can be accessed at:
          http://www.ietf.org/ietf/1id-abstracts.txt
     The list of Internet-Draft Shadow Directories can be accessed at:
          http://www.ietf.org/shadow.html.

Abstract

   A mechanism is needed to permit web clients to locate nearby web
   proxy caches. Current best practice is for end users to hand
   configure their web client (i.e., browser) with the URL of an "auto
   configuration file". In large environments this presents a
   formidable support problem.  It would be much more manageable for
   the web client software to automatically learn the configuration
   information for its web proxy settings. This is typically referred
   to as a resource discovery problem.

   Web client implementers are faced with a dizzying array of resource
   discovery protocols at varying levels of implementation and
   deployment. This complexity is hampering deployment of a "web proxy
   auto-discovery "facility.  This document proposes a pragmatic
   approach to web proxy auto-discovery.  It draws on a number of
   proposed standards in the light of practical deployment concerns. It
   proposes an escalating strategy of resource discovery attempts in
   order to find a nearby web proxy server. It attempts to provide rich

   Gauthier, Cohen, Dunsmuir, Perkins                         [Page 1]


   INTERNET-DRAFT Web Proxy Auto-Discovery Protocol           6/24/99

   mechanisms for supporting a complex environment, which may contain
   multiple web proxy servers.

Table of Contents

Status of This Memo...................................................1
Abstract..............................................................1
Table of Contents.....................................................2
1.   Conventions used in this document................................2
2.   Introduction.....................................................2
3.   Defining Web Proxy Auto-Discovery................................3
4.   The Discovery Process............................................4
 4.1.  WPAD Overview................................................4
 4.2.  When to Execute WPAD.........................................6
   4.2.1.  Upon Startup of the Web Client............................7
   4.2.2.  Network Stack Events......................................7
   4.2.3.  Expiration of the CFILE...................................7
 4.3.  WPAD Protocol Specification..................................7
 4.4.  Discovery Mechanisms.........................................9
   4.4.1.  DHCP......................................................9
   4.4.2.  SVRLOC/SLP...............................................10
   4.4.3.  DNS A/CNAME  "Well Known Aliasesö........................10
   4.4.4.  DNS SRV Records..........................................10
   4.4.5.  DNS TXT service: Entries.................................11
   4.4.6.  Fallback.................................................11
   4.4.7.  Timeouts.................................................11
 4.5.  Composing a Candidate CURL..................................12
 4.6.  Retrieving the CFILE at the CURL............................12
 4.7.  Resuming Discovery..........................................12
5.   Client Implementation Considerations............................12
6.   Proxy Server Considerations.....................................13
7.   Administrator Considerations....................................13
8.   Conditional Compliance..........................................14
 8.1.  Class 0 - Minimally compliant...............................14
 8.2.  Class 1 - Compliant.........................................15
 8.3.  Class 2 - Maximally compliant...............................15
9.   Security Considerations.........................................15
10.  Acknowledgements................................................15
11.  Copyright.......................................................16
12.  References......................................................16
13.  Author Information..............................................17

1.   Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in "Key words for use in
   RFCs to Indicate Requirement Levels" [KEYWORDS].

2.   Introduction

   The problem of locating nearby web proxy cache servers can not wait
   for the implementation and large scale deployment of various

   Category: Standards Track                    Expires: December 1999
   Gauthier, Cohen, Dunsmuir, Perkins                         [Page 2]


   INTERNET-DRAFT Web Proxy Auto-Discovery Protocol           6/24/99

   upcoming resource discovery protocols. The widespread success of the
   HTTP protocol and the recent popularity of streaming media has
   placed unanticipated strains on the networks of corporations, ISPs
   and backbone providers. There currently is no effective method for
   these organizations to realize the obvious benefits of web caching
   without tedious and error prone configuration by each and every end
   user.

   The de-facto mechanism for specifying a web proxy server
   configuration in web clients is the download of a script or
   configuration file named by a URL. Users are currently expected to
   hand configure this URL into their Browser or other web client.
   This mechanism suffers from a number of drawbacks:

   - Difficulty in supporting a large body of end-users. Many users
   misconfigure their proxy settings and are unable to diagnose the
   cause of their problems.

   - Lack of support for mobile clients who require a different proxy
   as their point of access changes.

   - Lack of support for complex proxy environments where there may
   exist a number of proxy servers with different affinities for
   different clients (based on network proximity, for example).
   Currently, clients would have to "know" which proxy server was
   optimal for their use.

   Currently available methods for resource discovery need to be
   exploited in the context of a well defined framework. Simple,
   functional and efficient mechanisms stand a good chance of solving
   this pressing and basic need. As new resource discovery mechanisms
   mature they can be folded into this framework with little
   difficulty.

   This document is a specification for implementers of web client
   software. It defines a protocol for automatically configuring those
   clients to use a local proxy. It also defines how an administrator
   should configure various resource discovery services in their
   network to support WPAD compatible web clients.

   While it does contain suggestions for web proxy server implementers,
   it does not make any specific demands of those parties.

3.   Defining Web Proxy Auto-Discovery

   As mentioned above, currently web client software needs to be
   configured with the URL of a proxy auto-configuration file or
   script. The contents of this script are vendor specific and not
   currently standardized. This document does not attempt to discuss
   the contents of these files (see[8] for an example file format).

   Thus, the Web Proxy Auto-Discovery (WPAD) problem reduces to
   providing the web client a mechanism for discovering the URL of the

   Category: Standards Track                    Expires: December 1999
   Gauthier, Cohen, Dunsmuir, Perkins                         [Page 3]


   INTERNET-DRAFT Web Proxy Auto-Discovery Protocol           6/24/99

   Configuration File. Once this Configuration URL (CURL) is known, the
   client software already contains mechanisms for retrieving and
   interpreting the Configuration File (CFILE) to enable access to the
   specified proxy cache servers.

   It is worth carefully noting that the goal of the WPAD process is to
   discover the correct CURL at which to retrieve the CFILE. The client
   is *not* trying to directly discover the name of the proxy server.
   That would circumvent the additional capabilities provided by proxy
   Configuration Files (such as load balancing, request routing to an
   array of servers, automated fail-over to backup proxy server [6,8]).

   It is worth noting that different clients requesting the CURL may
   receive completely different CFILEs in response. The web server may
   send back different CFILES based on a number of criteria such as the
   "User-Agent" header, "Accept" headers, client IP address/subnet,
   etc.  The same client could conceivably receive a different CFILE on
   successive retrievals (as a method of round-robin load balancing,
   for example).

   This document will discuss a range of mechanisms for discovering the
   Configuration URL. The client will attempt them in a predefined
   order, until one succeeds. Existing widely deployed facilities may
   not provide enough expressiveness to specify a complete URL. As
   such, we will define default values for portions of the CURL which
   may not be expressible by some discovery mechanisms:

   http://<HOST>:<PORT><PATH>

   <HOST> - There is no default for this potion. Any succeeding
        discovery mechanism will provide a value for the <HOST> portion
        of the CURL. The client MUST NOT provide a default.

   <PORT> - The client MUST assume port 80 if the successful discovery
        mechanism does not provide a port component.

   <PATH> - The client MUST assume a path of "/wpad.dat" if the
        successful discovery mechanism does not provide a path
        component.


4.   The Discovery Process

4.1. WPAD Overview

   This sub-section will present a descriptive overview of the WPAD
   protocol. It is intended to introduce the concepts and flow of the
   protocol. The remaining sub-sections (3.2-3.7) will provide the
   rigorous specification of the protocol details. WPAD uses a
   collection of pre-existing Internet resource discovery mechanisms to
   perform web proxy auto-discovery. Readers may wish to refer to [1]
   for a similar approach to resource discovery, since it was a basis
   for this strategy.  The WPAD protocol specifies the following:

   Category: Standards Track                    Expires: December 1999
   Gauthier, Cohen, Dunsmuir, Perkins                         [Page 4]


   INTERNET-DRAFT Web Proxy Auto-Discovery Protocol           6/24/99


   - how to use each mechanism for the specific purpose of web proxy
        auto-discovery
   - the order in which the mechanisms should be performed
   - the minimal set of mechanisms which must be attempted by a WPAD
        compliant web client

   The resource discovery mechanisms utilized by WPAD are as follows.
   - Dynamic Host Configuration Protocol (DHCP, [3,7]).
   - Service Location Protocol (SLP, [4]).
   - "Well Known Aliasesö using DNS A records [5,9].
   - DNS SRV records [2,9].
   - "service: URLs" in DNS TXT records [10].

   Of all these mechanisms only the DHCP and ôWell Known Aliasesö are
   required in WPAD clients. This decision is based on three reasons:
   these facilities are currently widely deployed in existing vendor
   hardware and software; they represent functionality that should
   cover most real world environments; they are relatively simple to
   implement.

   DNS servers supporting A records are clearly the most widely
   deployed of the services outlined above. It is reasonable to expect
   API support inside most web client development environments (POSIX
   C, Java, etc). The hierarchical nature of DNS makes it possible to
   support hierarchies of proxy servers.

   DNS is not suitable in every environment, unfortunately.
   Administrators often choose a DNS domain name hierarchy that does
   not correlate to network topologies, but rather with some
   organizational model (for example, foo.development.bar.com and
   foo.marketing.bar.com). DHCP servers, on the other hand, are
   frequently deployed with concern for network topologies. DHCP
   servers provide support for making configuration decisions based on
   subnets, which are directly related to network topology.

   Full client support for DHCP is not as ubiquitous as for DNS. That
   is, not all clients are equipped to take advantage of DHCP for their
   essential network configuration (assignment of IP address, network
   mask, etc). APIs for DHCP are not as widely available. Luckily,
   using DHCP for WPAD does not require either of these facilities. It
   is relatively easy for web client developers to speak just the
   minimal DHCP protocol to perform resource discovery. It entails
   building a simple UDP packet, sending it to the subnet broadcast
   address, and parsing the reply UDP packet(s) which are received to
   extract the WPAD option field. A reference implementation of this
   code in C is available [11].

   The WPAD client attempts a series of resource discovery requests,
   using the discovery mechanisms mentioned above, in a specific order.
   Clients only attempt mechanisms that they support (obviously). Each
   time the discovery attempt succeeds; the client uses the information
   obtained to construct a CURL. If a CFILE is successfully retrieved

   Category: Standards Track                    Expires: December 1999
   Gauthier, Cohen, Dunsmuir, Perkins                         [Page 5]


   INTERNET-DRAFT Web Proxy Auto-Discovery Protocol           6/24/99

   at that CURL, the process completes. If not, the client resumes
   where it left of in the predefined series of resource discovery
   requests. If no untried mechanisms remain and a CFILE has not been
   successfully retrieved, the WPAD protocol fails and the client is
   configured to use no proxy server.

   First the client tries DHCP, followed by SLP. If no CFILE has been
   retrieved the client moves on to the DNS based mechanisms. The
   client will cycle through the DNS SRV, ôWell Known Aliasesö and DNS
   TXT record methods multiple times. Each time through the QNAME being
   used in the DNS query is made less and less specific. In this manner
   the client can locate the most specific configuration information
   possible, but can fall back on less specific information. Every DNS
   lookup has the QNAME prefixed with ôwpadö to indicate the resource
   type being requested.

   As an example, consider a client with hostname johns-
   desktop.development.foo.com. Assume the web client software supports
   all of the mechanisms listed above. This is the sequence of
   discovery attempts the client would perform until one succeeded in
   locating a valid CFILE:

   - DHCP
   - SLP
   - DNS A lookup on QNAME=wpad.development.foo.com.
   - DNS SRV lookup on QNAME=wpad.development.foo.com.
   - DNS TXT lookup on QNAME=wpad.development.foo.com.
   - DBS A lookup on QNAME=wpad.foo.com.
   - DNS SRV lookup on QNAME=wpad.foo.com.
   - DNS TXT lookup on QNAME=wpad.foo.com.

4.2. When to Execute WPAD

   Web clients need to perform the WPAD protocol periodically to
   maintain correct proxy settings. This should occur on a regular
   basis corresponding to initialization of the client software or the
   networking stack below the client. As well, WPAD will need to occur
   in response to expiration of existing configuration data.  The
   following sections describe the details of these scenarios.  3.2.1.
   Periodic Discovery

   The web proxy auto-discovery process MUST occur at least as
   frequently as one of the following two options. A web client can use
   either option depending on which makes sense in their environment.
   Clients MUST use at least one of the following options. They MAY
   also choose to implement both options.
   - Upon startup of the web client.
   - Whenever there indication from the networking stack that the  IP
   address of the client host either has, or could have, changed.

   In addition, the client MUST attempt a discovery cycle upon
   expiration of a previously downloaded CFILE in accordance with
   HTTP/1.1.

   Category: Standards Track                    Expires: December 1999
   Gauthier, Cohen, Dunsmuir, Perkins                         [Page 6]


   INTERNET-DRAFT Web Proxy Auto-Discovery Protocol           6/24/99


4.2.1.    Upon Startup of the Web Client

   For many types of web client (like web browsers) there can be many
   instances of the client operating for a given user at one time. This
   is often to allow display of multiple web pages in different
   windows, for example. There is no need to re-perform WPAD every time
   a new instance of the web client is opened. WPAD MUST be performed
   when the number of web client instances transitions from 0 to 1. It
   SHOULD NOT be performed as additional instances are created.

4.2.2.    Network Stack Events

   Another option for clients is to tie the execution of WPAD to
   changes in the networking environment. If the client can learn about
   the change of the local hostÆs IP address, or the possible change of
   the IP address, it MUST re-perform the WPAD protocol.  Many
   operating systems provide indications of ônetwork upö events, for
   example. Those types of events and system-boot events might be the
   triggers for WPAD in many environments.

4.2.3.    Expiration of the CFILE

   The HTTP retrieval of the CURL may return HTTP headers specifying a
   valid lifetime for the CFILE returned. The client MUST obey these
   timeouts and rerun the PAD process when it expires. A client MAY
   rerun the WPAD process if it detects a failure of the currently
   configured proxy (which is not otherwise recoverable via the
   inherent mechanisms provided by the currently active Configuration
   File).

   Whenever the client decides to invalidate the current CURL or CFILE,
   it MUST rerun the entire WPAD protocol to ensure it discovers the
   currently correct CURL. Specifically, if the valid lifetime of the
   CFILE ends(as specified by the HTTP headers provided when it was
   retrieved),the complete WPAD protocol MUST be rerun. The client MUST
   NOT simply re-use the existing CURL to obtain a fresh copy of the
   CFILE.

   A number of network round trips, broadcast and/or multicast
   communications may be required during the WPAD protocol. The WPAD
   protocol SHOULD NOT be invoked at a more frequent rate than
   specified above (such as per-URL retrieval).

4.3. WPAD Protocol Specification

   The following pseudo-code defines the WPAD protocol.  If a
   particular discovery mechanism is not supported, treat it as a
   failed discovery attempt in the pseudo-code.

   In addition, this logic is expressed below in pseudo-code.
   The following pseudo-code fragment defines WPAD.  Unsupported
   discovery mechanisms are treated as failure in the pseudo-code.

   Category: Standards Track                    Expires: December 1999
   Gauthier, Cohen, Dunsmuir, Perkins                         [Page 7]


   INTERNET-DRAFT Web Proxy Auto-Discovery Protocol           6/24/99


   Two subroutines need explanation. The subroutine
   strip_leading_component(dns_string) strips off the leading
   characters, up to and including the first dot (`.') in the string
   which is passed as a parameter, and is expected to contain DNS name.
   The Boolean subroutine is_not_canonical(dns_string) returns FALSE if
   dns_string is one of the canonical domain suffixes defined in RFC