| 1 | \declaremodule{standard}{email.header}
|
|---|
| 2 | \modulesynopsis{Representing non-ASCII headers}
|
|---|
| 3 |
|
|---|
| 4 | \rfc{2822} is the base standard that describes the format of email
|
|---|
| 5 | messages. It derives from the older \rfc{822} standard which came
|
|---|
| 6 | into widespread use at a time when most email was composed of \ASCII{}
|
|---|
| 7 | characters only. \rfc{2822} is a specification written assuming email
|
|---|
| 8 | contains only 7-bit \ASCII{} characters.
|
|---|
| 9 |
|
|---|
| 10 | Of course, as email has been deployed worldwide, it has become
|
|---|
| 11 | internationalized, such that language specific character sets can now
|
|---|
| 12 | be used in email messages. The base standard still requires email
|
|---|
| 13 | messages to be transferred using only 7-bit \ASCII{} characters, so a
|
|---|
| 14 | slew of RFCs have been written describing how to encode email
|
|---|
| 15 | containing non-\ASCII{} characters into \rfc{2822}-compliant format.
|
|---|
| 16 | These RFCs include \rfc{2045}, \rfc{2046}, \rfc{2047}, and \rfc{2231}.
|
|---|
| 17 | The \module{email} package supports these standards in its
|
|---|
| 18 | \module{email.header} and \module{email.charset} modules.
|
|---|
| 19 |
|
|---|
| 20 | If you want to include non-\ASCII{} characters in your email headers,
|
|---|
| 21 | say in the \mailheader{Subject} or \mailheader{To} fields, you should
|
|---|
| 22 | use the \class{Header} class and assign the field in the
|
|---|
| 23 | \class{Message} object to an instance of \class{Header} instead of
|
|---|
| 24 | using a string for the header value. Import the \class{Header} class from the
|
|---|
| 25 | \module{email.header} module. For example:
|
|---|
| 26 |
|
|---|
| 27 | \begin{verbatim}
|
|---|
| 28 | >>> from email.message import Message
|
|---|
| 29 | >>> from email.header import Header
|
|---|
| 30 | >>> msg = Message()
|
|---|
| 31 | >>> h = Header('p\xf6stal', 'iso-8859-1')
|
|---|
| 32 | >>> msg['Subject'] = h
|
|---|
| 33 | >>> print msg.as_string()
|
|---|
| 34 | Subject: =?iso-8859-1?q?p=F6stal?=
|
|---|
| 35 |
|
|---|
| 36 |
|
|---|
| 37 | \end{verbatim}
|
|---|
| 38 |
|
|---|
| 39 | Notice here how we wanted the \mailheader{Subject} field to contain a
|
|---|
| 40 | non-\ASCII{} character? We did this by creating a \class{Header}
|
|---|
| 41 | instance and passing in the character set that the byte string was
|
|---|
| 42 | encoded in. When the subsequent \class{Message} instance was
|
|---|
| 43 | flattened, the \mailheader{Subject} field was properly \rfc{2047}
|
|---|
| 44 | encoded. MIME-aware mail readers would show this header using the
|
|---|
| 45 | embedded ISO-8859-1 character.
|
|---|
| 46 |
|
|---|
| 47 | \versionadded{2.2.2}
|
|---|
| 48 |
|
|---|
| 49 | Here is the \class{Header} class description:
|
|---|
| 50 |
|
|---|
| 51 | \begin{classdesc}{Header}{\optional{s\optional{, charset\optional{,
|
|---|
| 52 | maxlinelen\optional{, header_name\optional{, continuation_ws\optional{,
|
|---|
| 53 | errors}}}}}}}
|
|---|
| 54 | Create a MIME-compliant header that can contain strings in different
|
|---|
| 55 | character sets.
|
|---|
| 56 |
|
|---|
| 57 | Optional \var{s} is the initial header value. If \code{None} (the
|
|---|
| 58 | default), the initial header value is not set. You can later append
|
|---|
| 59 | to the header with \method{append()} method calls. \var{s} may be a
|
|---|
| 60 | byte string or a Unicode string, but see the \method{append()}
|
|---|
| 61 | documentation for semantics.
|
|---|
| 62 |
|
|---|
| 63 | Optional \var{charset} serves two purposes: it has the same meaning as
|
|---|
| 64 | the \var{charset} argument to the \method{append()} method. It also
|
|---|
| 65 | sets the default character set for all subsequent \method{append()}
|
|---|
| 66 | calls that omit the \var{charset} argument. If \var{charset} is not
|
|---|
| 67 | provided in the constructor (the default), the \code{us-ascii}
|
|---|
| 68 | character set is used both as \var{s}'s initial charset and as the
|
|---|
| 69 | default for subsequent \method{append()} calls.
|
|---|
| 70 |
|
|---|
| 71 | The maximum line length can be specified explicit via
|
|---|
| 72 | \var{maxlinelen}. For splitting the first line to a shorter value (to
|
|---|
| 73 | account for the field header which isn't included in \var{s},
|
|---|
| 74 | e.g. \mailheader{Subject}) pass in the name of the field in
|
|---|
| 75 | \var{header_name}. The default \var{maxlinelen} is 76, and the
|
|---|
| 76 | default value for \var{header_name} is \code{None}, meaning it is not
|
|---|
| 77 | taken into account for the first line of a long, split header.
|
|---|
| 78 |
|
|---|
| 79 | Optional \var{continuation_ws} must be \rfc{2822}-compliant folding
|
|---|
| 80 | whitespace, and is usually either a space or a hard tab character.
|
|---|
| 81 | This character will be prepended to continuation lines.
|
|---|
| 82 | \end{classdesc}
|
|---|
| 83 |
|
|---|
| 84 | Optional \var{errors} is passed straight through to the
|
|---|
| 85 | \method{append()} method.
|
|---|
| 86 |
|
|---|
| 87 | \begin{methoddesc}[Header]{append}{s\optional{, charset\optional{, errors}}}
|
|---|
| 88 | Append the string \var{s} to the MIME header.
|
|---|
| 89 |
|
|---|
| 90 | Optional \var{charset}, if given, should be a \class{Charset} instance
|
|---|
| 91 | (see \refmodule{email.charset}) or the name of a character set, which
|
|---|
| 92 | will be converted to a \class{Charset} instance. A value of
|
|---|
| 93 | \code{None} (the default) means that the \var{charset} given in the
|
|---|
| 94 | constructor is used.
|
|---|
| 95 |
|
|---|
| 96 | \var{s} may be a byte string or a Unicode string. If it is a byte
|
|---|
| 97 | string (i.e. \code{isinstance(s, str)} is true), then
|
|---|
| 98 | \var{charset} is the encoding of that byte string, and a
|
|---|
| 99 | \exception{UnicodeError} will be raised if the string cannot be
|
|---|
| 100 | decoded with that character set.
|
|---|
| 101 |
|
|---|
| 102 | If \var{s} is a Unicode string, then \var{charset} is a hint
|
|---|
| 103 | specifying the character set of the characters in the string. In this
|
|---|
| 104 | case, when producing an \rfc{2822}-compliant header using \rfc{2047}
|
|---|
| 105 | rules, the Unicode string will be encoded using the following charsets
|
|---|
| 106 | in order: \code{us-ascii}, the \var{charset} hint, \code{utf-8}. The
|
|---|
| 107 | first character set to not provoke a \exception{UnicodeError} is used.
|
|---|
| 108 |
|
|---|
| 109 | Optional \var{errors} is passed through to any \function{unicode()} or
|
|---|
| 110 | \function{ustr.encode()} call, and defaults to ``strict''.
|
|---|
| 111 | \end{methoddesc}
|
|---|
| 112 |
|
|---|
| 113 | \begin{methoddesc}[Header]{encode}{\optional{splitchars}}
|
|---|
| 114 | Encode a message header into an RFC-compliant format, possibly
|
|---|
| 115 | wrapping long lines and encapsulating non-\ASCII{} parts in base64 or
|
|---|
| 116 | quoted-printable encodings. Optional \var{splitchars} is a string
|
|---|
| 117 | containing characters to split long ASCII lines on, in rough support
|
|---|
| 118 | of \rfc{2822}'s \emph{highest level syntactic breaks}. This doesn't
|
|---|
| 119 | affect \rfc{2047} encoded lines.
|
|---|
| 120 | \end{methoddesc}
|
|---|
| 121 |
|
|---|
| 122 | The \class{Header} class also provides a number of methods to support
|
|---|
| 123 | standard operators and built-in functions.
|
|---|
| 124 |
|
|---|
| 125 | \begin{methoddesc}[Header]{__str__}{}
|
|---|
| 126 | A synonym for \method{Header.encode()}. Useful for
|
|---|
| 127 | \code{str(aHeader)}.
|
|---|
| 128 | \end{methoddesc}
|
|---|
| 129 |
|
|---|
| 130 | \begin{methoddesc}[Header]{__unicode__}{}
|
|---|
| 131 | A helper for the built-in \function{unicode()} function. Returns the
|
|---|
| 132 | header as a Unicode string.
|
|---|
| 133 | \end{methoddesc}
|
|---|
| 134 |
|
|---|
| 135 | \begin{methoddesc}[Header]{__eq__}{other}
|
|---|
| 136 | This method allows you to compare two \class{Header} instances for equality.
|
|---|
| 137 | \end{methoddesc}
|
|---|
| 138 |
|
|---|
| 139 | \begin{methoddesc}[Header]{__ne__}{other}
|
|---|
| 140 | This method allows you to compare two \class{Header} instances for inequality.
|
|---|
| 141 | \end{methoddesc}
|
|---|
| 142 |
|
|---|
| 143 | The \module{email.header} module also provides the following
|
|---|
| 144 | convenient functions.
|
|---|
| 145 |
|
|---|
| 146 | \begin{funcdesc}{decode_header}{header}
|
|---|
| 147 | Decode a message header value without converting the character set.
|
|---|
| 148 | The header value is in \var{header}.
|
|---|
| 149 |
|
|---|
| 150 | This function returns a list of \code{(decoded_string, charset)} pairs
|
|---|
| 151 | containing each of the decoded parts of the header. \var{charset} is
|
|---|
| 152 | \code{None} for non-encoded parts of the header, otherwise a lower
|
|---|
| 153 | case string containing the name of the character set specified in the
|
|---|
| 154 | encoded string.
|
|---|
| 155 |
|
|---|
| 156 | Here's an example:
|
|---|
| 157 |
|
|---|
| 158 | \begin{verbatim}
|
|---|
| 159 | >>> from email.header import decode_header
|
|---|
| 160 | >>> decode_header('=?iso-8859-1?q?p=F6stal?=')
|
|---|
| 161 | [('p\xf6stal', 'iso-8859-1')]
|
|---|
| 162 | \end{verbatim}
|
|---|
| 163 | \end{funcdesc}
|
|---|
| 164 |
|
|---|
| 165 | \begin{funcdesc}{make_header}{decoded_seq\optional{, maxlinelen\optional{,
|
|---|
| 166 | header_name\optional{, continuation_ws}}}}
|
|---|
| 167 | Create a \class{Header} instance from a sequence of pairs as returned
|
|---|
| 168 | by \function{decode_header()}.
|
|---|
| 169 |
|
|---|
| 170 | \function{decode_header()} takes a header value string and returns a
|
|---|
| 171 | sequence of pairs of the format \code{(decoded_string, charset)} where
|
|---|
| 172 | \var{charset} is the name of the character set.
|
|---|
| 173 |
|
|---|
| 174 | This function takes one of those sequence of pairs and returns a
|
|---|
| 175 | \class{Header} instance. Optional \var{maxlinelen},
|
|---|
| 176 | \var{header_name}, and \var{continuation_ws} are as in the
|
|---|
| 177 | \class{Header} constructor.
|
|---|
| 178 | \end{funcdesc}
|
|---|