| 1 | \section{\module{cookielib} ---
|
|---|
| 2 | Cookie handling for HTTP clients}
|
|---|
| 3 |
|
|---|
| 4 | \declaremodule{standard}{cookielib}
|
|---|
| 5 | \moduleauthor{John J. Lee}{[email protected]}
|
|---|
| 6 | \sectionauthor{John J. Lee}{[email protected]}
|
|---|
| 7 |
|
|---|
| 8 | \versionadded{2.4}
|
|---|
| 9 |
|
|---|
| 10 | \modulesynopsis{Cookie handling for HTTP clients}
|
|---|
| 11 |
|
|---|
| 12 | The \module{cookielib} module defines classes for automatic handling
|
|---|
| 13 | of HTTP cookies. It is useful for accessing web sites that require
|
|---|
| 14 | small pieces of data -- \dfn{cookies} -- to be set on the client
|
|---|
| 15 | machine by an HTTP response from a web server, and then returned to
|
|---|
| 16 | the server in later HTTP requests.
|
|---|
| 17 |
|
|---|
| 18 | Both the regular Netscape cookie protocol and the protocol defined by
|
|---|
| 19 | \rfc{2965} are handled. RFC 2965 handling is switched off by default.
|
|---|
| 20 | \rfc{2109} cookies are parsed as Netscape cookies and subsequently
|
|---|
| 21 | treated either as Netscape or RFC 2965 cookies according to the
|
|---|
| 22 | 'policy' in effect. Note that the great majority of cookies on the
|
|---|
| 23 | Internet are Netscape cookies. \module{cookielib} attempts to follow
|
|---|
| 24 | the de-facto Netscape cookie protocol (which differs substantially
|
|---|
| 25 | from that set out in the original Netscape specification), including
|
|---|
| 26 | taking note of the \code{max-age} and \code{port} cookie-attributes
|
|---|
| 27 | introduced with RFC 2965. \note{The various named parameters found in
|
|---|
| 28 | \mailheader{Set-Cookie} and \mailheader{Set-Cookie2} headers
|
|---|
| 29 | (eg. \code{domain} and \code{expires}) are conventionally referred to
|
|---|
| 30 | as \dfn{attributes}. To distinguish them from Python attributes, the
|
|---|
| 31 | documentation for this module uses the term \dfn{cookie-attribute}
|
|---|
| 32 | instead}.
|
|---|
| 33 |
|
|---|
| 34 |
|
|---|
| 35 | The module defines the following exception:
|
|---|
| 36 |
|
|---|
| 37 | \begin{excdesc}{LoadError}
|
|---|
| 38 | Instances of \class{FileCookieJar} raise this exception on failure to
|
|---|
| 39 | load cookies from a file. \note{For backwards-compatibility
|
|---|
| 40 | with Python 2.4 (which raised an \exception{IOError}),
|
|---|
| 41 | \exception{LoadError} is a subclass of \exception{IOError}}.
|
|---|
| 42 | \end{excdesc}
|
|---|
| 43 |
|
|---|
| 44 |
|
|---|
| 45 | The following classes are provided:
|
|---|
| 46 |
|
|---|
| 47 | \begin{classdesc}{CookieJar}{policy=\constant{None}}
|
|---|
| 48 | \var{policy} is an object implementing the \class{CookiePolicy}
|
|---|
| 49 | interface.
|
|---|
| 50 |
|
|---|
| 51 | The \class{CookieJar} class stores HTTP cookies. It extracts cookies
|
|---|
| 52 | from HTTP requests, and returns them in HTTP responses.
|
|---|
| 53 | \class{CookieJar} instances automatically expire contained cookies
|
|---|
| 54 | when necessary. Subclasses are also responsible for storing and
|
|---|
| 55 | retrieving cookies from a file or database.
|
|---|
| 56 | \end{classdesc}
|
|---|
| 57 |
|
|---|
| 58 | \begin{classdesc}{FileCookieJar}{filename, delayload=\constant{None},
|
|---|
| 59 | policy=\constant{None}}
|
|---|
| 60 | \var{policy} is an object implementing the \class{CookiePolicy}
|
|---|
| 61 | interface. For the other arguments, see the documentation for the
|
|---|
| 62 | corresponding attributes.
|
|---|
| 63 |
|
|---|
| 64 | A \class{CookieJar} which can load cookies from, and perhaps save
|
|---|
| 65 | cookies to, a file on disk. Cookies are \strong{NOT} loaded from the
|
|---|
| 66 | named file until either the \method{load()} or \method{revert()}
|
|---|
| 67 | method is called. Subclasses of this class are documented in section
|
|---|
| 68 | \ref{file-cookie-jar-classes}.
|
|---|
| 69 | \end{classdesc}
|
|---|
| 70 |
|
|---|
| 71 | \begin{classdesc}{CookiePolicy}{}
|
|---|
| 72 | This class is responsible for deciding whether each cookie should be
|
|---|
| 73 | accepted from / returned to the server.
|
|---|
| 74 | \end{classdesc}
|
|---|
| 75 |
|
|---|
| 76 | \begin{classdesc}{DefaultCookiePolicy}{
|
|---|
| 77 | blocked_domains=\constant{None},
|
|---|
| 78 | allowed_domains=\constant{None},
|
|---|
| 79 | netscape=\constant{True}, rfc2965=\constant{False},
|
|---|
| 80 | rfc2109_as_netscape=\constant{None},
|
|---|
| 81 | hide_cookie2=\constant{False},
|
|---|
| 82 | strict_domain=\constant{False},
|
|---|
| 83 | strict_rfc2965_unverifiable=\constant{True},
|
|---|
| 84 | strict_ns_unverifiable=\constant{False},
|
|---|
| 85 | strict_ns_domain=\constant{DefaultCookiePolicy.DomainLiberal},
|
|---|
| 86 | strict_ns_set_initial_dollar=\constant{False},
|
|---|
| 87 | strict_ns_set_path=\constant{False}
|
|---|
| 88 | }
|
|---|
| 89 |
|
|---|
| 90 | Constructor arguments should be passed as keyword arguments only.
|
|---|
| 91 | \var{blocked_domains} is a sequence of domain names that we never
|
|---|
| 92 | accept cookies from, nor return cookies to. \var{allowed_domains} if
|
|---|
| 93 | not \constant{None}, this is a sequence of the only domains for which
|
|---|
| 94 | we accept and return cookies. For all other arguments, see the
|
|---|
| 95 | documentation for \class{CookiePolicy} and \class{DefaultCookiePolicy}
|
|---|
| 96 | objects.
|
|---|
| 97 |
|
|---|
| 98 | \class{DefaultCookiePolicy} implements the standard accept / reject
|
|---|
| 99 | rules for Netscape and RFC 2965 cookies. By default, RFC 2109 cookies
|
|---|
| 100 | (ie. cookies received in a \mailheader{Set-Cookie} header with a
|
|---|
| 101 | version cookie-attribute of 1) are treated according to the RFC 2965
|
|---|
| 102 | rules. However, if RFC 2965 handling is turned off or
|
|---|
| 103 | \member{rfc2109_as_netscape} is True, RFC 2109 cookies are
|
|---|
| 104 | 'downgraded' by the \class{CookieJar} instance to Netscape cookies, by
|
|---|
| 105 | setting the \member{version} attribute of the \class{Cookie} instance
|
|---|
| 106 | to 0. \class{DefaultCookiePolicy} also provides some parameters to
|
|---|
| 107 | allow some fine-tuning of policy.
|
|---|
| 108 | \end{classdesc}
|
|---|
| 109 |
|
|---|
| 110 | \begin{classdesc}{Cookie}{}
|
|---|
| 111 | This class represents Netscape, RFC 2109 and RFC 2965 cookies. It is
|
|---|
| 112 | not expected that users of \module{cookielib} construct their own
|
|---|
| 113 | \class{Cookie} instances. Instead, if necessary, call
|
|---|
| 114 | \method{make_cookies()} on a \class{CookieJar} instance.
|
|---|
| 115 | \end{classdesc}
|
|---|
| 116 |
|
|---|
| 117 | \begin{seealso}
|
|---|
| 118 |
|
|---|
| 119 | \seemodule{urllib2}{URL opening with automatic cookie handling.}
|
|---|
| 120 |
|
|---|
| 121 | \seemodule{Cookie}{HTTP cookie classes, principally useful for
|
|---|
| 122 | server-side code. The \module{cookielib} and \module{Cookie} modules
|
|---|
| 123 | do not depend on each other.}
|
|---|
| 124 |
|
|---|
| 125 | \seeurl{http://wwwsearch.sf.net/ClientCookie/}{Extensions to this
|
|---|
| 126 | module, including a class for reading Microsoft Internet Explorer
|
|---|
| 127 | cookies on Windows.}
|
|---|
| 128 |
|
|---|
| 129 | \seeurl{http://www.netscape.com/newsref/std/cookie_spec.html}{The
|
|---|
| 130 | specification of the original Netscape cookie protocol. Though this
|
|---|
| 131 | is still the dominant protocol, the 'Netscape cookie protocol'
|
|---|
| 132 | implemented by all the major browsers (and \module{cookielib}) only
|
|---|
| 133 | bears a passing resemblance to the one sketched out in
|
|---|
| 134 | \code{cookie_spec.html}.}
|
|---|
| 135 |
|
|---|
| 136 | \seerfc{2109}{HTTP State Management Mechanism}{Obsoleted by RFC 2965.
|
|---|
| 137 | Uses \mailheader{Set-Cookie} with version=1.}
|
|---|
| 138 |
|
|---|
| 139 | \seerfc{2965}{HTTP State Management Mechanism}{The Netscape protocol
|
|---|
| 140 | with the bugs fixed. Uses \mailheader{Set-Cookie2} in place of
|
|---|
| 141 | \mailheader{Set-Cookie}. Not widely used.}
|
|---|
| 142 |
|
|---|
| 143 | \seeurl{http://kristol.org/cookie/errata.html}{Unfinished errata to
|
|---|
| 144 | RFC 2965.}
|
|---|
| 145 |
|
|---|
| 146 | \seerfc{2964}{Use of HTTP State Management}{}
|
|---|
| 147 |
|
|---|
| 148 | \end{seealso}
|
|---|
| 149 |
|
|---|
| 150 |
|
|---|
| 151 | \subsection{CookieJar and FileCookieJar Objects \label{cookie-jar-objects}}
|
|---|
| 152 |
|
|---|
| 153 | \class{CookieJar} objects support the iterator protocol for iterating
|
|---|
| 154 | over contained \class{Cookie} objects.
|
|---|
| 155 |
|
|---|
| 156 | \class{CookieJar} has the following methods:
|
|---|
| 157 |
|
|---|
| 158 | \begin{methoddesc}[CookieJar]{add_cookie_header}{request}
|
|---|
| 159 | Add correct \mailheader{Cookie} header to \var{request}.
|
|---|
| 160 |
|
|---|
| 161 | If policy allows (ie. the \member{rfc2965} and \member{hide_cookie2}
|
|---|
| 162 | attributes of the \class{CookieJar}'s \class{CookiePolicy} instance
|
|---|
| 163 | are true and false respectively), the \mailheader{Cookie2} header is
|
|---|
| 164 | also added when appropriate.
|
|---|
| 165 |
|
|---|
| 166 | The \var{request} object (usually a \class{urllib2.Request} instance)
|
|---|
| 167 | must support the methods \method{get_full_url()}, \method{get_host()},
|
|---|
| 168 | \method{get_type()}, \method{unverifiable()},
|
|---|
| 169 | \method{get_origin_req_host()}, \method{has_header()},
|
|---|
| 170 | \method{get_header()}, \method{header_items()}, and
|
|---|
| 171 | \method{add_unredirected_header()},as documented by \module{urllib2}.
|
|---|
| 172 | \end{methoddesc}
|
|---|
| 173 |
|
|---|
| 174 | \begin{methoddesc}[CookieJar]{extract_cookies}{response, request}
|
|---|
| 175 | Extract cookies from HTTP \var{response} and store them in the
|
|---|
| 176 | \class{CookieJar}, where allowed by policy.
|
|---|
| 177 |
|
|---|
| 178 | The \class{CookieJar} will look for allowable \mailheader{Set-Cookie}
|
|---|
| 179 | and \mailheader{Set-Cookie2} headers in the \var{response} argument,
|
|---|
| 180 | and store cookies as appropriate (subject to the
|
|---|
| 181 | \method{CookiePolicy.set_ok()} method's approval).
|
|---|
| 182 |
|
|---|
| 183 | The \var{response} object (usually the result of a call to
|
|---|
| 184 | \method{urllib2.urlopen()}, or similar) should support an
|
|---|
| 185 | \method{info()} method, which returns an object with a
|
|---|
| 186 | \method{getallmatchingheaders()} method (usually a
|
|---|
| 187 | \class{mimetools.Message} instance).
|
|---|
| 188 |
|
|---|
| 189 | The \var{request} object (usually a \class{urllib2.Request} instance)
|
|---|
| 190 | must support the methods \method{get_full_url()}, \method{get_host()},
|
|---|
| 191 | \method{unverifiable()}, and \method{get_origin_req_host()}, as
|
|---|
| 192 | documented by \module{urllib2}. The request is used to set default
|
|---|
| 193 | values for cookie-attributes as well as for checking that the cookie
|
|---|
| 194 | is allowed to be set.
|
|---|
| 195 | \end{methoddesc}
|
|---|
| 196 |
|
|---|
| 197 | \begin{methoddesc}[CookieJar]{set_policy}{policy}
|
|---|
| 198 | Set the \class{CookiePolicy} instance to be used.
|
|---|
| 199 | \end{methoddesc}
|
|---|
| 200 |
|
|---|
| 201 | \begin{methoddesc}[CookieJar]{make_cookies}{response, request}
|
|---|
| 202 | Return sequence of \class{Cookie} objects extracted from
|
|---|
| 203 | \var{response} object.
|
|---|
| 204 |
|
|---|
| 205 | See the documentation for \method{extract_cookies} for the interfaces
|
|---|
| 206 | required of the \var{response} and \var{request} arguments.
|
|---|
| 207 | \end{methoddesc}
|
|---|
| 208 |
|
|---|
| 209 | \begin{methoddesc}[CookieJar]{set_cookie_if_ok}{cookie, request}
|
|---|
| 210 | Set a \class{Cookie} if policy says it's OK to do so.
|
|---|
| 211 | \end{methoddesc}
|
|---|
| 212 |
|
|---|
| 213 | \begin{methoddesc}[CookieJar]{set_cookie}{cookie}
|
|---|
| 214 | Set a \class{Cookie}, without checking with policy to see whether or
|
|---|
| 215 | not it should be set.
|
|---|
| 216 | \end{methoddesc}
|
|---|
| 217 |
|
|---|
| 218 | \begin{methoddesc}[CookieJar]{clear}{\optional{domain\optional{,
|
|---|
| 219 | path\optional{, name}}}}
|
|---|
| 220 | Clear some cookies.
|
|---|
| 221 |
|
|---|
| 222 | If invoked without arguments, clear all cookies. If given a single
|
|---|
| 223 | argument, only cookies belonging to that \var{domain} will be removed.
|
|---|
| 224 | If given two arguments, cookies belonging to the specified
|
|---|
| 225 | \var{domain} and URL \var{path} are removed. If given three
|
|---|
| 226 | arguments, then the cookie with the specified \var{domain}, \var{path}
|
|---|
| 227 | and \var{name} is removed.
|
|---|
| 228 |
|
|---|
| 229 | Raises \exception{KeyError} if no matching cookie exists.
|
|---|
| 230 | \end{methoddesc}
|
|---|
| 231 |
|
|---|
| 232 | \begin{methoddesc}[CookieJar]{clear_session_cookies}{}
|
|---|
| 233 | Discard all session cookies.
|
|---|
| 234 |
|
|---|
| 235 | Discards all contained cookies that have a true \member{discard}
|
|---|
| 236 | attribute (usually because they had either no \code{max-age} or
|
|---|
| 237 | \code{expires} cookie-attribute, or an explicit \code{discard}
|
|---|
| 238 | cookie-attribute). For interactive browsers, the end of a session
|
|---|
| 239 | usually corresponds to closing the browser window.
|
|---|
| 240 |
|
|---|
| 241 | Note that the \method{save()} method won't save session cookies
|
|---|
| 242 | anyway, unless you ask otherwise by passing a true
|
|---|
| 243 | \var{ignore_discard} argument.
|
|---|
| 244 | \end{methoddesc}
|
|---|
| 245 |
|
|---|
| 246 | \class{FileCookieJar} implements the following additional methods:
|
|---|
| 247 |
|
|---|
| 248 | \begin{methoddesc}[FileCookieJar]{save}{filename=\constant{None},
|
|---|
| 249 | ignore_discard=\constant{False}, ignore_expires=\constant{False}}
|
|---|
| 250 | Save cookies to a file.
|
|---|
| 251 |
|
|---|
| 252 | This base class raises \exception{NotImplementedError}. Subclasses may
|
|---|
| 253 | leave this method unimplemented.
|
|---|
| 254 |
|
|---|
| 255 | \var{filename} is the name of file in which to save cookies. If
|
|---|
| 256 | \var{filename} is not specified, \member{self.filename} is used (whose
|
|---|
| 257 | default is the value passed to the constructor, if any); if
|
|---|
| 258 | \member{self.filename} is \constant{None}, \exception{ValueError} is
|
|---|
| 259 | raised.
|
|---|
| 260 |
|
|---|
| 261 | \var{ignore_discard}: save even cookies set to be discarded.
|
|---|
| 262 | \var{ignore_expires}: save even cookies that have expired
|
|---|
| 263 |
|
|---|
| 264 | The file is overwritten if it already exists, thus wiping all the
|
|---|
| 265 | cookies it contains. Saved cookies can be restored later using the
|
|---|
| 266 | \method{load()} or \method{revert()} methods.
|
|---|
| 267 | \end{methoddesc}
|
|---|
| 268 |
|
|---|
| 269 | \begin{methoddesc}[FileCookieJar]{load}{filename=\constant{None},
|
|---|
| 270 | ignore_discard=\constant{False}, ignore_expires=\constant{False}}
|
|---|
| 271 | Load cookies from a file.
|
|---|
| 272 |
|
|---|
| 273 | Old cookies are kept unless overwritten by newly loaded ones.
|
|---|
| 274 |
|
|---|
| 275 | Arguments are as for \method{save()}.
|
|---|
| 276 |
|
|---|
| 277 | The named file must be in the format understood by the class, or
|
|---|
| 278 | \exception{LoadError} will be raised. Also, \exception{IOError} may
|
|---|
| 279 | be raised, for example if the file does not exist. \note{For
|
|---|
| 280 | backwards-compatibility with Python 2.4 (which raised
|
|---|
| 281 | an \exception{IOError}), \exception{LoadError} is a subclass
|
|---|
| 282 | of \exception{IOError}.}
|
|---|
| 283 | \end{methoddesc}
|
|---|
| 284 |
|
|---|
| 285 | \begin{methoddesc}[FileCookieJar]{revert}{filename=\constant{None},
|
|---|
| 286 | ignore_discard=\constant{False}, ignore_expires=\constant{False}}
|
|---|
| 287 | Clear all cookies and reload cookies from a saved file.
|
|---|
| 288 |
|
|---|
| 289 | \method{revert()} can raise the same exceptions as \method{load()}.
|
|---|
| 290 | If there is a failure, the object's state will not be altered.
|
|---|
| 291 | \end{methoddesc}
|
|---|
| 292 |
|
|---|
| 293 | \class{FileCookieJar} instances have the following public attributes:
|
|---|
| 294 |
|
|---|
| 295 | \begin{memberdesc}{filename}
|
|---|
| 296 | Filename of default file in which to keep cookies. This attribute may
|
|---|
| 297 | be assigned to.
|
|---|
| 298 | \end{memberdesc}
|
|---|
| 299 |
|
|---|
| 300 | \begin{memberdesc}{delayload}
|
|---|
| 301 | If true, load cookies lazily from disk. This attribute should not be
|
|---|
| 302 | assigned to. This is only a hint, since this only affects
|
|---|
| 303 | performance, not behaviour (unless the cookies on disk are changing).
|
|---|
| 304 | A \class{CookieJar} object may ignore it. None of the
|
|---|
| 305 | \class{FileCookieJar} classes included in the standard library lazily
|
|---|
| 306 | loads cookies.
|
|---|
| 307 | \end{memberdesc}
|
|---|
| 308 |
|
|---|
| 309 |
|
|---|
| 310 | \subsection{FileCookieJar subclasses and co-operation with web browsers
|
|---|
| 311 | \label{file-cookie-jar-classes}}
|
|---|
| 312 |
|
|---|
| 313 | The following \class{CookieJar} subclasses are provided for reading
|
|---|
| 314 | and writing . Further \class{CookieJar} subclasses, including one
|
|---|
| 315 | that reads Microsoft Internet Explorer cookies, are available at
|
|---|
| 316 | \url{http://wwwsearch.sf.net/ClientCookie/}.
|
|---|
| 317 |
|
|---|
| 318 | \begin{classdesc}{MozillaCookieJar}{filename, delayload=\constant{None},
|
|---|
| 319 | policy=\constant{None}}
|
|---|
| 320 | A \class{FileCookieJar} that can load from and save cookies to disk in
|
|---|
| 321 | the Mozilla \code{cookies.txt} file format (which is also used by the
|
|---|
| 322 | Lynx and Netscape browsers). \note{This loses information about RFC
|
|---|
| 323 | 2965 cookies, and also about newer or non-standard cookie-attributes
|
|---|
| 324 | such as \code{port}.}
|
|---|
| 325 |
|
|---|
| 326 | \warning{Back up your cookies before saving if you have cookies whose
|
|---|
| 327 | loss / corruption would be inconvenient (there are some subtleties
|
|---|
| 328 | which may lead to slight changes in the file over a load / save
|
|---|
| 329 | round-trip).}
|
|---|
| 330 |
|
|---|
| 331 | Also note that cookies saved while Mozilla is running will get
|
|---|
| 332 | clobbered by Mozilla.
|
|---|
| 333 | \end{classdesc}
|
|---|
| 334 |
|
|---|
| 335 | \begin{classdesc}{LWPCookieJar}{filename, delayload=\constant{None},
|
|---|
| 336 | policy=\constant{None}}
|
|---|
| 337 | A \class{FileCookieJar} that can load from and save cookies to disk in
|
|---|
| 338 | format compatible with the libwww-perl library's \code{Set-Cookie3}
|
|---|
| 339 | file format. This is convenient if you want to store cookies in a
|
|---|
| 340 | human-readable file.
|
|---|
| 341 | \end{classdesc}
|
|---|
| 342 |
|
|---|
| 343 |
|
|---|
| 344 | \subsection{CookiePolicy Objects \label{cookie-policy-objects}}
|
|---|
| 345 |
|
|---|
| 346 | Objects implementing the \class{CookiePolicy} interface have the
|
|---|
| 347 | following methods:
|
|---|
| 348 |
|
|---|
| 349 | \begin{methoddesc}[CookiePolicy]{set_ok}{cookie, request}
|
|---|
| 350 | Return boolean value indicating whether cookie should be accepted from server.
|
|---|
| 351 |
|
|---|
| 352 | \var{cookie} is a \class{cookielib.Cookie} instance. \var{request} is
|
|---|
| 353 | an object implementing the interface defined by the documentation for
|
|---|
| 354 | \method{CookieJar.extract_cookies()}.
|
|---|
| 355 | \end{methoddesc}
|
|---|
| 356 |
|
|---|
| 357 | \begin{methoddesc}[CookiePolicy]{return_ok}{cookie, request}
|
|---|
| 358 | Return boolean value indicating whether cookie should be returned to server.
|
|---|
| 359 |
|
|---|
| 360 | \var{cookie} is a \class{cookielib.Cookie} instance. \var{request} is
|
|---|
| 361 | an object implementing the interface defined by the documentation for
|
|---|
| 362 | \method{CookieJar.add_cookie_header()}.
|
|---|
| 363 | \end{methoddesc}
|
|---|
| 364 |
|
|---|
| 365 | \begin{methoddesc}[CookiePolicy]{domain_return_ok}{domain, request}
|
|---|
| 366 | Return false if cookies should not be returned, given cookie domain.
|
|---|
| 367 |
|
|---|
| 368 | This method is an optimization. It removes the need for checking
|
|---|
| 369 | every cookie with a particular domain (which might involve reading
|
|---|
| 370 | many files). Returning true from \method{domain_return_ok()} and
|
|---|
| 371 | \method{path_return_ok()} leaves all the work to \method{return_ok()}.
|
|---|
| 372 |
|
|---|
| 373 | If \method{domain_return_ok()} returns true for the cookie domain,
|
|---|
| 374 | \method{path_return_ok()} is called for the cookie path. Otherwise,
|
|---|
| 375 | \method{path_return_ok()} and \method{return_ok()} are never called
|
|---|
| 376 | for that cookie domain. If \method{path_return_ok()} returns true,
|
|---|
| 377 | \method{return_ok()} is called with the \class{Cookie} object itself
|
|---|
| 378 | for a full check. Otherwise, \method{return_ok()} is never called for
|
|---|
| 379 | that cookie path.
|
|---|
| 380 |
|
|---|
| 381 | Note that \method{domain_return_ok()} is called for every
|
|---|
| 382 | \emph{cookie} domain, not just for the \emph{request} domain. For
|
|---|
| 383 | example, the function might be called with both \code{".example.com"}
|
|---|
| 384 | and \code{"www.example.com"} if the request domain is
|
|---|
| 385 | \code{"www.example.com"}. The same goes for
|
|---|
| 386 | \method{path_return_ok()}.
|
|---|
| 387 |
|
|---|
| 388 | The \var{request} argument is as documented for \method{return_ok()}.
|
|---|
| 389 | \end{methoddesc}
|
|---|
| 390 |
|
|---|
| 391 | \begin{methoddesc}[CookiePolicy]{path_return_ok}{path, request}
|
|---|
| 392 | Return false if cookies should not be returned, given cookie path.
|
|---|
| 393 |
|
|---|
| 394 | See the documentation for \method{domain_return_ok()}.
|
|---|
| 395 | \end{methoddesc}
|
|---|
| 396 |
|
|---|
| 397 |
|
|---|
| 398 | In addition to implementing the methods above, implementations of the
|
|---|
| 399 | \class{CookiePolicy} interface must also supply the following
|
|---|
| 400 | attributes, indicating which protocols should be used, and how. All
|
|---|
| 401 | of these attributes may be assigned to.
|
|---|
| 402 |
|
|---|
| 403 | \begin{memberdesc}{netscape}
|
|---|
| 404 | Implement Netscape protocol.
|
|---|
| 405 | \end{memberdesc}
|
|---|
| 406 | \begin{memberdesc}{rfc2965}
|
|---|
| 407 | Implement RFC 2965 protocol.
|
|---|
| 408 | \end{memberdesc}
|
|---|
| 409 | \begin{memberdesc}{hide_cookie2}
|
|---|
| 410 | Don't add \mailheader{Cookie2} header to requests (the presence of
|
|---|
| 411 | this header indicates to the server that we understand RFC 2965
|
|---|
| 412 | cookies).
|
|---|
| 413 | \end{memberdesc}
|
|---|
| 414 |
|
|---|
| 415 | The most useful way to define a \class{CookiePolicy} class is by
|
|---|
| 416 | subclassing from \class{DefaultCookiePolicy} and overriding some or
|
|---|
| 417 | all of the methods above. \class{CookiePolicy} itself may be used as
|
|---|
| 418 | a 'null policy' to allow setting and receiving any and all cookies
|
|---|
| 419 | (this is unlikely to be useful).
|
|---|
| 420 |
|
|---|
| 421 |
|
|---|
| 422 | \subsection{DefaultCookiePolicy Objects \label{default-cookie-policy-objects}}
|
|---|
| 423 |
|
|---|
| 424 | Implements the standard rules for accepting and returning cookies.
|
|---|
| 425 |
|
|---|
| 426 | Both RFC 2965 and Netscape cookies are covered. RFC 2965 handling is
|
|---|
| 427 | switched off by default.
|
|---|
| 428 |
|
|---|
| 429 | The easiest way to provide your own policy is to override this class
|
|---|
| 430 | and call its methods in your overridden implementations before adding
|
|---|
| 431 | your own additional checks:
|
|---|
| 432 |
|
|---|
| 433 | \begin{verbatim}
|
|---|
| 434 | import cookielib
|
|---|
| 435 | class MyCookiePolicy(cookielib.DefaultCookiePolicy):
|
|---|
| 436 | def set_ok(self, cookie, request):
|
|---|
| 437 | if not cookielib.DefaultCookiePolicy.set_ok(self, cookie, request):
|
|---|
| 438 | return False
|
|---|
| 439 | if i_dont_want_to_store_this_cookie(cookie):
|
|---|
| 440 | return False
|
|---|
| 441 | return True
|
|---|
| 442 | \end{verbatim}
|
|---|
| 443 |
|
|---|
| 444 | In addition to the features required to implement the
|
|---|
| 445 | \class{CookiePolicy} interface, this class allows you to block and
|
|---|
| 446 | allow domains from setting and receiving cookies. There are also some
|
|---|
| 447 | strictness switches that allow you to tighten up the rather loose
|
|---|
| 448 | Netscape protocol rules a little bit (at the cost of blocking some
|
|---|
| 449 | benign cookies).
|
|---|
| 450 |
|
|---|
| 451 | A domain blacklist and whitelist is provided (both off by default).
|
|---|
| 452 | Only domains not in the blacklist and present in the whitelist (if the
|
|---|
| 453 | whitelist is active) participate in cookie setting and returning. Use
|
|---|
| 454 | the \var{blocked_domains} constructor argument, and
|
|---|
| 455 | \method{blocked_domains()} and \method{set_blocked_domains()} methods
|
|---|
| 456 | (and the corresponding argument and methods for
|
|---|
| 457 | \var{allowed_domains}). If you set a whitelist, you can turn it off
|
|---|
| 458 | again by setting it to \constant{None}.
|
|---|
| 459 |
|
|---|
| 460 | Domains in block or allow lists that do not start with a dot must
|
|---|
| 461 | equal the cookie domain to be matched. For example,
|
|---|
| 462 | \code{"example.com"} matches a blacklist entry of
|
|---|
| 463 | \code{"example.com"}, but \code{"www.example.com"} does not. Domains
|
|---|
| 464 | that do start with a dot are matched by more specific domains too.
|
|---|
| 465 | For example, both \code{"www.example.com"} and
|
|---|
| 466 | \code{"www.coyote.example.com"} match \code{".example.com"} (but
|
|---|
| 467 | \code{"example.com"} itself does not). IP addresses are an exception,
|
|---|
| 468 | and must match exactly. For example, if blocked_domains contains
|
|---|
| 469 | \code{"192.168.1.2"} and \code{".168.1.2"}, 192.168.1.2 is blocked,
|
|---|
| 470 | but 193.168.1.2 is not.
|
|---|
| 471 |
|
|---|
| 472 | \class{DefaultCookiePolicy} implements the following additional
|
|---|
| 473 | methods:
|
|---|
| 474 |
|
|---|
| 475 | \begin{methoddesc}[DefaultCookiePolicy]{blocked_domains}{}
|
|---|
| 476 | Return the sequence of blocked domains (as a tuple).
|
|---|
| 477 | \end{methoddesc}
|
|---|
| 478 |
|
|---|
| 479 | \begin{methoddesc}[DefaultCookiePolicy]{set_blocked_domains}
|
|---|
| 480 | {blocked_domains}
|
|---|
| 481 | Set the sequence of blocked domains.
|
|---|
| 482 | \end{methoddesc}
|
|---|
| 483 |
|
|---|
| 484 | \begin{methoddesc}[DefaultCookiePolicy]{is_blocked}{domain}
|
|---|
| 485 | Return whether \var{domain} is on the blacklist for setting or
|
|---|
| 486 | receiving cookies.
|
|---|
| 487 | \end{methoddesc}
|
|---|
| 488 |
|
|---|
| 489 | \begin{methoddesc}[DefaultCookiePolicy]{allowed_domains}{}
|
|---|
| 490 | Return \constant{None}, or the sequence of allowed domains (as a tuple).
|
|---|
| 491 | \end{methoddesc}
|
|---|
| 492 |
|
|---|
| 493 | \begin{methoddesc}[DefaultCookiePolicy]{set_allowed_domains}
|
|---|
| 494 | {allowed_domains}
|
|---|
| 495 | Set the sequence of allowed domains, or \constant{None}.
|
|---|
| 496 | \end{methoddesc}
|
|---|
| 497 |
|
|---|
| 498 | \begin{methoddesc}[DefaultCookiePolicy]{is_not_allowed}{domain}
|
|---|
| 499 | Return whether \var{domain} is not on the whitelist for setting or
|
|---|
| 500 | receiving cookies.
|
|---|
| 501 | \end{methoddesc}
|
|---|
| 502 |
|
|---|
| 503 | \class{DefaultCookiePolicy} instances have the following attributes,
|
|---|
| 504 | which are all initialised from the constructor arguments of the same
|
|---|
| 505 | name, and which may all be assigned to.
|
|---|
| 506 |
|
|---|
| 507 | \begin{memberdesc}{rfc2109_as_netscape}
|
|---|
| 508 | If true, request that the \class{CookieJar} instance downgrade RFC
|
|---|
| 509 | 2109 cookies (ie. cookies received in a \mailheader{Set-Cookie} header
|
|---|
| 510 | with a version cookie-attribute of 1) to Netscape cookies by setting
|
|---|
| 511 | the version attribute of the \class{Cookie} instance to 0. The
|
|---|
| 512 | default value is \constant{None}, in which case RFC 2109 cookies are
|
|---|
| 513 | downgraded if and only if RFC 2965 handling is turned off. Therefore,
|
|---|
| 514 | RFC 2109 cookies are downgraded by default.
|
|---|
| 515 | \versionadded{2.5}
|
|---|
| 516 | \end{memberdesc}
|
|---|
| 517 |
|
|---|
| 518 | General strictness switches:
|
|---|
| 519 |
|
|---|
| 520 | \begin{memberdesc}{strict_domain}
|
|---|
| 521 | Don't allow sites to set two-component domains with country-code
|
|---|
| 522 | top-level domains like \code{.co.uk}, \code{.gov.uk},
|
|---|
| 523 | \code{.co.nz}.etc. This is far from perfect and isn't guaranteed to
|
|---|
| 524 | work!
|
|---|
| 525 | \end{memberdesc}
|
|---|
| 526 |
|
|---|
| 527 | RFC 2965 protocol strictness switches:
|
|---|
| 528 |
|
|---|
| 529 | \begin{memberdesc}{strict_rfc2965_unverifiable}
|
|---|
| 530 | Follow RFC 2965 rules on unverifiable transactions (usually, an
|
|---|
| 531 | unverifiable transaction is one resulting from a redirect or a request
|
|---|
| 532 | for an image hosted on another site). If this is false, cookies are
|
|---|
| 533 | \emph{never} blocked on the basis of verifiability
|
|---|
| 534 | \end{memberdesc}
|
|---|
| 535 |
|
|---|
| 536 | Netscape protocol strictness switches:
|
|---|
| 537 |
|
|---|
| 538 | \begin{memberdesc}{strict_ns_unverifiable}
|
|---|
| 539 | apply RFC 2965 rules on unverifiable transactions even to Netscape
|
|---|
| 540 | cookies
|
|---|
| 541 | \end{memberdesc}
|
|---|
| 542 | \begin{memberdesc}{strict_ns_domain}
|
|---|
| 543 | Flags indicating how strict to be with domain-matching rules for
|
|---|
| 544 | Netscape cookies. See below for acceptable values.
|
|---|
| 545 | \end{memberdesc}
|
|---|
| 546 | \begin{memberdesc}{strict_ns_set_initial_dollar}
|
|---|
| 547 | Ignore cookies in Set-Cookie: headers that have names starting with
|
|---|
| 548 | \code{'\$'}.
|
|---|
| 549 | \end{memberdesc}
|
|---|
| 550 | \begin{memberdesc}{strict_ns_set_path}
|
|---|
| 551 | Don't allow setting cookies whose path doesn't path-match request URI.
|
|---|
| 552 | \end{memberdesc}
|
|---|
| 553 |
|
|---|
| 554 | \member{strict_ns_domain} is a collection of flags. Its value is
|
|---|
| 555 | constructed by or-ing together (for example,
|
|---|
| 556 | \code{DomainStrictNoDots|DomainStrictNonDomain} means both flags are
|
|---|
| 557 | set).
|
|---|
| 558 |
|
|---|
| 559 | \begin{memberdesc}{DomainStrictNoDots}
|
|---|
| 560 | When setting cookies, the 'host prefix' must not contain a dot
|
|---|
| 561 | (eg. \code{www.foo.bar.com} can't set a cookie for \code{.bar.com},
|
|---|
| 562 | because \code{www.foo} contains a dot).
|
|---|
| 563 | \end{memberdesc}
|
|---|
| 564 | \begin{memberdesc}{DomainStrictNonDomain}
|
|---|
| 565 | Cookies that did not explicitly specify a \code{domain}
|
|---|
| 566 | cookie-attribute can only be returned to a domain equal to the domain
|
|---|
| 567 | that set the cookie (eg. \code{spam.example.com} won't be returned
|
|---|
| 568 | cookies from \code{example.com} that had no \code{domain}
|
|---|
| 569 | cookie-attribute).
|
|---|
| 570 | \end{memberdesc}
|
|---|
| 571 | \begin{memberdesc}{DomainRFC2965Match}
|
|---|
| 572 | When setting cookies, require a full RFC 2965 domain-match.
|
|---|
| 573 | \end{memberdesc}
|
|---|
| 574 |
|
|---|
| 575 | The following attributes are provided for convenience, and are the
|
|---|
| 576 | most useful combinations of the above flags:
|
|---|
| 577 |
|
|---|
| 578 | \begin{memberdesc}{DomainLiberal}
|
|---|
| 579 | Equivalent to 0 (ie. all of the above Netscape domain strictness flags
|
|---|
| 580 | switched off).
|
|---|
| 581 | \end{memberdesc}
|
|---|
| 582 | \begin{memberdesc}{DomainStrict}
|
|---|
| 583 | Equivalent to \code{DomainStrictNoDots|DomainStrictNonDomain}.
|
|---|
| 584 | \end{memberdesc}
|
|---|
| 585 |
|
|---|
| 586 |
|
|---|
| 587 | \subsection{Cookie Objects \label{cookie-objects}}
|
|---|
| 588 |
|
|---|
| 589 | \class{Cookie} instances have Python attributes roughly corresponding
|
|---|
| 590 | to the standard cookie-attributes specified in the various cookie
|
|---|
| 591 | standards. The correspondence is not one-to-one, because there are
|
|---|
| 592 | complicated rules for assigning default values, because the
|
|---|
| 593 | \code{max-age} and \code{expires} cookie-attributes contain equivalent
|
|---|
| 594 | information, and because RFC 2109 cookies may be 'downgraded' by
|
|---|
| 595 | \module{cookielib} from version 1 to version 0 (Netscape) cookies.
|
|---|
| 596 |
|
|---|
| 597 | Assignment to these attributes should not be necessary other than in
|
|---|
| 598 | rare circumstances in a \class{CookiePolicy} method. The class does
|
|---|
| 599 | not enforce internal consistency, so you should know what you're
|
|---|
| 600 | doing if you do that.
|
|---|
| 601 |
|
|---|
| 602 | \begin{memberdesc}[Cookie]{version}
|
|---|
| 603 | Integer or \constant{None}. Netscape cookies have \member{version} 0.
|
|---|
| 604 | RFC 2965 and RFC 2109 cookies have a \code{version} cookie-attribute
|
|---|
| 605 | of 1. However, note that \module{cookielib} may 'downgrade' RFC 2109
|
|---|
| 606 | cookies to Netscape cookies, in which case \member{version} is 0.
|
|---|
| 607 | \end{memberdesc}
|
|---|
| 608 | \begin{memberdesc}[Cookie]{name}
|
|---|
| 609 | Cookie name (a string).
|
|---|
| 610 | \end{memberdesc}
|
|---|
| 611 | \begin{memberdesc}[Cookie]{value}
|
|---|
| 612 | Cookie value (a string), or \constant{None}.
|
|---|
| 613 | \end{memberdesc}
|
|---|
| 614 | \begin{memberdesc}[Cookie]{port}
|
|---|
| 615 | String representing a port or a set of ports (eg. '80', or '80,8080'),
|
|---|
| 616 | or \constant{None}.
|
|---|
| 617 | \end{memberdesc}
|
|---|
| 618 | \begin{memberdesc}[Cookie]{path}
|
|---|
| 619 | Cookie path (a string, eg. \code{'/acme/rocket_launchers'}).
|
|---|
| 620 | \end{memberdesc}
|
|---|
| 621 | \begin{memberdesc}[Cookie]{secure}
|
|---|
| 622 | True if cookie should only be returned over a secure connection.
|
|---|
| 623 | \end{memberdesc}
|
|---|
| 624 | \begin{memberdesc}[Cookie]{expires}
|
|---|
| 625 | Integer expiry date in seconds since epoch, or \constant{None}. See
|
|---|
| 626 | also the \method{is_expired()} method.
|
|---|
| 627 | \end{memberdesc}
|
|---|
| 628 | \begin{memberdesc}[Cookie]{discard}
|
|---|
| 629 | True if this is a session cookie.
|
|---|
| 630 | \end{memberdesc}
|
|---|
| 631 | \begin{memberdesc}[Cookie]{comment}
|
|---|
| 632 | String comment from the server explaining the function of this cookie,
|
|---|
| 633 | or \constant{None}.
|
|---|
| 634 | \end{memberdesc}
|
|---|
| 635 | \begin{memberdesc}[Cookie]{comment_url}
|
|---|
| 636 | URL linking to a comment from the server explaining the function of
|
|---|
| 637 | this cookie, or \constant{None}.
|
|---|
| 638 | \end{memberdesc}
|
|---|
| 639 | \begin{memberdesc}[Cookie]{rfc2109}
|
|---|
| 640 | True if this cookie was received as an RFC 2109 cookie (ie. the cookie
|
|---|
| 641 | arrived in a \mailheader{Set-Cookie} header, and the value of the
|
|---|
| 642 | Version cookie-attribute in that header was 1). This attribute is
|
|---|
| 643 | provided because \module{cookielib} may 'downgrade' RFC 2109 cookies
|
|---|
| 644 | to Netscape cookies, in which case \member{version} is 0.
|
|---|
| 645 | \versionadded{2.5}
|
|---|
| 646 | \end{memberdesc}
|
|---|
| 647 |
|
|---|
| 648 | \begin{memberdesc}[Cookie]{port_specified}
|
|---|
| 649 | True if a port or set of ports was explicitly specified by the server
|
|---|
| 650 | (in the \mailheader{Set-Cookie} / \mailheader{Set-Cookie2} header).
|
|---|
| 651 | \end{memberdesc}
|
|---|
| 652 | \begin{memberdesc}[Cookie]{domain_specified}
|
|---|
| 653 | True if a domain was explicitly specified by the server.
|
|---|
| 654 | \end{memberdesc}
|
|---|
| 655 | \begin{memberdesc}[Cookie]{domain_initial_dot}
|
|---|
| 656 | True if the domain explicitly specified by the server began with a
|
|---|
| 657 | dot (\code{'.'}).
|
|---|
| 658 | \end{memberdesc}
|
|---|
| 659 |
|
|---|
| 660 | Cookies may have additional non-standard cookie-attributes. These may
|
|---|
| 661 | be accessed using the following methods:
|
|---|
| 662 |
|
|---|
| 663 | \begin{methoddesc}[Cookie]{has_nonstandard_attr}{name}
|
|---|
| 664 | Return true if cookie has the named cookie-attribute.
|
|---|
| 665 | \end{methoddesc}
|
|---|
| 666 | \begin{methoddesc}[Cookie]{get_nonstandard_attr}{name, default=\constant{None}}
|
|---|
| 667 | If cookie has the named cookie-attribute, return its value.
|
|---|
| 668 | Otherwise, return \var{default}.
|
|---|
| 669 | \end{methoddesc}
|
|---|
| 670 | \begin{methoddesc}[Cookie]{set_nonstandard_attr}{name, value}
|
|---|
| 671 | Set the value of the named cookie-attribute.
|
|---|
| 672 | \end{methoddesc}
|
|---|
| 673 |
|
|---|
| 674 | The \class{Cookie} class also defines the following method:
|
|---|
| 675 |
|
|---|
| 676 | \begin{methoddesc}[Cookie]{is_expired}{\optional{now=\constant{None}}}
|
|---|
| 677 | True if cookie has passed the time at which the server requested it
|
|---|
| 678 | should expire. If \var{now} is given (in seconds since the epoch),
|
|---|
| 679 | return whether the cookie has expired at the specified time.
|
|---|
| 680 | \end{methoddesc}
|
|---|
| 681 |
|
|---|
| 682 |
|
|---|
| 683 | \subsection{Examples \label{cookielib-examples}}
|
|---|
| 684 |
|
|---|
| 685 | The first example shows the most common usage of \module{cookielib}:
|
|---|
| 686 |
|
|---|
| 687 | \begin{verbatim}
|
|---|
| 688 | import cookielib, urllib2
|
|---|
| 689 | cj = cookielib.CookieJar()
|
|---|
| 690 | opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
|
|---|
| 691 | r = opener.open("http://example.com/")
|
|---|
| 692 | \end{verbatim}
|
|---|
| 693 |
|
|---|
| 694 | This example illustrates how to open a URL using your Netscape,
|
|---|
| 695 | Mozilla, or Lynx cookies (assumes \UNIX{}/Netscape convention for
|
|---|
| 696 | location of the cookies file):
|
|---|
| 697 |
|
|---|
| 698 | \begin{verbatim}
|
|---|
| 699 | import os, cookielib, urllib2
|
|---|
| 700 | cj = cookielib.MozillaCookieJar()
|
|---|
| 701 | cj.load(os.path.join(os.environ["HOME"], ".netscape/cookies.txt"))
|
|---|
| 702 | opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
|
|---|
| 703 | r = opener.open("http://example.com/")
|
|---|
| 704 | \end{verbatim}
|
|---|
| 705 |
|
|---|
| 706 | The next example illustrates the use of \class{DefaultCookiePolicy}.
|
|---|
| 707 | Turn on RFC 2965 cookies, be more strict about domains when setting
|
|---|
| 708 | and returning Netscape cookies, and block some domains from setting
|
|---|
| 709 | cookies or having them returned:
|
|---|
| 710 |
|
|---|
| 711 | \begin{verbatim}
|
|---|
| 712 | import urllib2
|
|---|
| 713 | from cookielib import CookieJar, DefaultCookiePolicy
|
|---|
| 714 | policy = DefaultCookiePolicy(
|
|---|
| 715 | rfc2965=True, strict_ns_domain=Policy.DomainStrict,
|
|---|
| 716 | blocked_domains=["ads.net", ".ads.net"])
|
|---|
| 717 | cj = CookieJar(policy)
|
|---|
| 718 | opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
|
|---|
| 719 | r = opener.open("http://example.com/")
|
|---|
| 720 | \end{verbatim}
|
|---|