| 1 | \section{\module{multifile} ---
|
|---|
| 2 | Support for files containing distinct parts}
|
|---|
| 3 |
|
|---|
| 4 | \declaremodule{standard}{multifile}
|
|---|
| 5 | \modulesynopsis{Support for reading files which contain distinct
|
|---|
| 6 | parts, such as some MIME data.}
|
|---|
| 7 | \sectionauthor{Eric S. Raymond}{[email protected]}
|
|---|
| 8 |
|
|---|
| 9 | \deprecated{2.5}{The \refmodule{email} package should be used in
|
|---|
| 10 | preference to the \module{multifile} module.
|
|---|
| 11 | This module is present only to maintain backward
|
|---|
| 12 | compatibility.}
|
|---|
| 13 |
|
|---|
| 14 | The \class{MultiFile} object enables you to treat sections of a text
|
|---|
| 15 | file as file-like input objects, with \code{''} being returned by
|
|---|
| 16 | \method{readline()} when a given delimiter pattern is encountered. The
|
|---|
| 17 | defaults of this class are designed to make it useful for parsing
|
|---|
| 18 | MIME multipart messages, but by subclassing it and overriding methods
|
|---|
| 19 | it can be easily adapted for more general use.
|
|---|
| 20 |
|
|---|
| 21 | \begin{classdesc}{MultiFile}{fp\optional{, seekable}}
|
|---|
| 22 | Create a multi-file. You must instantiate this class with an input
|
|---|
| 23 | object argument for the \class{MultiFile} instance to get lines from,
|
|---|
| 24 | such as a file object returned by \function{open()}.
|
|---|
| 25 |
|
|---|
| 26 | \class{MultiFile} only ever looks at the input object's
|
|---|
| 27 | \method{readline()}, \method{seek()} and \method{tell()} methods, and
|
|---|
| 28 | the latter two are only needed if you want random access to the
|
|---|
| 29 | individual MIME parts. To use \class{MultiFile} on a non-seekable
|
|---|
| 30 | stream object, set the optional \var{seekable} argument to false; this
|
|---|
| 31 | will prevent using the input object's \method{seek()} and
|
|---|
| 32 | \method{tell()} methods.
|
|---|
| 33 | \end{classdesc}
|
|---|
| 34 |
|
|---|
| 35 | It will be useful to know that in \class{MultiFile}'s view of the world, text
|
|---|
| 36 | is composed of three kinds of lines: data, section-dividers, and
|
|---|
| 37 | end-markers. MultiFile is designed to support parsing of
|
|---|
| 38 | messages that may have multiple nested message parts, each with its
|
|---|
| 39 | own pattern for section-divider and end-marker lines.
|
|---|
| 40 |
|
|---|
| 41 | \begin{seealso}
|
|---|
| 42 | \seemodule{email}{Comprehensive email handling package; supersedes
|
|---|
| 43 | the \module{multifile} module.}
|
|---|
| 44 | \end{seealso}
|
|---|
| 45 |
|
|---|
| 46 |
|
|---|
| 47 | \subsection{MultiFile Objects \label{MultiFile-objects}}
|
|---|
| 48 |
|
|---|
| 49 | A \class{MultiFile} instance has the following methods:
|
|---|
| 50 |
|
|---|
| 51 | \begin{methoddesc}{readline}{str}
|
|---|
| 52 | Read a line. If the line is data (not a section-divider or end-marker
|
|---|
| 53 | or real EOF) return it. If the line matches the most-recently-stacked
|
|---|
| 54 | boundary, return \code{''} and set \code{self.last} to 1 or 0 according as
|
|---|
| 55 | the match is or is not an end-marker. If the line matches any other
|
|---|
| 56 | stacked boundary, raise an error. On encountering end-of-file on the
|
|---|
| 57 | underlying stream object, the method raises \exception{Error} unless
|
|---|
| 58 | all boundaries have been popped.
|
|---|
| 59 | \end{methoddesc}
|
|---|
| 60 |
|
|---|
| 61 | \begin{methoddesc}{readlines}{str}
|
|---|
| 62 | Return all lines remaining in this part as a list of strings.
|
|---|
| 63 | \end{methoddesc}
|
|---|
| 64 |
|
|---|
| 65 | \begin{methoddesc}{read}{}
|
|---|
| 66 | Read all lines, up to the next section. Return them as a single
|
|---|
| 67 | (multiline) string. Note that this doesn't take a size argument!
|
|---|
| 68 | \end{methoddesc}
|
|---|
| 69 |
|
|---|
| 70 | \begin{methoddesc}{seek}{pos\optional{, whence}}
|
|---|
| 71 | Seek. Seek indices are relative to the start of the current section.
|
|---|
| 72 | The \var{pos} and \var{whence} arguments are interpreted as for a file
|
|---|
| 73 | seek.
|
|---|
| 74 | \end{methoddesc}
|
|---|
| 75 |
|
|---|
| 76 | \begin{methoddesc}{tell}{}
|
|---|
| 77 | Return the file position relative to the start of the current section.
|
|---|
| 78 | \end{methoddesc}
|
|---|
| 79 |
|
|---|
| 80 | \begin{methoddesc}{next}{}
|
|---|
| 81 | Skip lines to the next section (that is, read lines until a
|
|---|
| 82 | section-divider or end-marker has been consumed). Return true if
|
|---|
| 83 | there is such a section, false if an end-marker is seen. Re-enable
|
|---|
| 84 | the most-recently-pushed boundary.
|
|---|
| 85 | \end{methoddesc}
|
|---|
| 86 |
|
|---|
| 87 | \begin{methoddesc}{is_data}{str}
|
|---|
| 88 | Return true if \var{str} is data and false if it might be a section
|
|---|
| 89 | boundary. As written, it tests for a prefix other than \code{'-}\code{-'} at
|
|---|
| 90 | start of line (which all MIME boundaries have) but it is declared so
|
|---|
| 91 | it can be overridden in derived classes.
|
|---|
| 92 |
|
|---|
| 93 | Note that this test is used intended as a fast guard for the real
|
|---|
| 94 | boundary tests; if it always returns false it will merely slow
|
|---|
| 95 | processing, not cause it to fail.
|
|---|
| 96 | \end{methoddesc}
|
|---|
| 97 |
|
|---|
| 98 | \begin{methoddesc}{push}{str}
|
|---|
| 99 | Push a boundary string. When a decorated version of this boundary
|
|---|
| 100 | is found as an input line, it will be interpreted as a section-divider
|
|---|
| 101 | or end-marker (depending on the decoration, see \rfc{2045}). All subsequent
|
|---|
| 102 | reads will return the empty string to indicate end-of-file, until a
|
|---|
| 103 | call to \method{pop()} removes the boundary a or \method{next()} call
|
|---|
| 104 | reenables it.
|
|---|
| 105 |
|
|---|
| 106 | It is possible to push more than one boundary. Encountering the
|
|---|
| 107 | most-recently-pushed boundary will return EOF; encountering any other
|
|---|
| 108 | boundary will raise an error.
|
|---|
| 109 | \end{methoddesc}
|
|---|
| 110 |
|
|---|
| 111 | \begin{methoddesc}{pop}{}
|
|---|
| 112 | Pop a section boundary. This boundary will no longer be interpreted
|
|---|
| 113 | as EOF.
|
|---|
| 114 | \end{methoddesc}
|
|---|
| 115 |
|
|---|
| 116 | \begin{methoddesc}{section_divider}{str}
|
|---|
| 117 | Turn a boundary into a section-divider line. By default, this
|
|---|
| 118 | method prepends \code{'-}\code{-'} (which MIME section boundaries have) but
|
|---|
| 119 | it is declared so it can be overridden in derived classes. This
|
|---|
| 120 | method need not append LF or CR-LF, as comparison with the result
|
|---|
| 121 | ignores trailing whitespace.
|
|---|
| 122 | \end{methoddesc}
|
|---|
| 123 |
|
|---|
| 124 | \begin{methoddesc}{end_marker}{str}
|
|---|
| 125 | Turn a boundary string into an end-marker line. By default, this
|
|---|
| 126 | method prepends \code{'-}\code{-'} and appends \code{'-}\code{-'} (like a
|
|---|
| 127 | MIME-multipart end-of-message marker) but it is declared so it can be
|
|---|
| 128 | overridden in derived classes. This method need not append LF or
|
|---|
| 129 | CR-LF, as comparison with the result ignores trailing whitespace.
|
|---|
| 130 | \end{methoddesc}
|
|---|
| 131 |
|
|---|
| 132 | Finally, \class{MultiFile} instances have two public instance variables:
|
|---|
| 133 |
|
|---|
| 134 | \begin{memberdesc}{level}
|
|---|
| 135 | Nesting depth of the current part.
|
|---|
| 136 | \end{memberdesc}
|
|---|
| 137 |
|
|---|
| 138 | \begin{memberdesc}{last}
|
|---|
| 139 | True if the last end-of-file was for an end-of-message marker.
|
|---|
| 140 | \end{memberdesc}
|
|---|
| 141 |
|
|---|
| 142 |
|
|---|
| 143 | \subsection{\class{MultiFile} Example \label{multifile-example}}
|
|---|
| 144 | \sectionauthor{Skip Montanaro}{[email protected]}
|
|---|
| 145 |
|
|---|
| 146 | \begin{verbatim}
|
|---|
| 147 | import mimetools
|
|---|
| 148 | import multifile
|
|---|
| 149 | import StringIO
|
|---|
| 150 |
|
|---|
| 151 | def extract_mime_part_matching(stream, mimetype):
|
|---|
| 152 | """Return the first element in a multipart MIME message on stream
|
|---|
| 153 | matching mimetype."""
|
|---|
| 154 |
|
|---|
| 155 | msg = mimetools.Message(stream)
|
|---|
| 156 | msgtype = msg.gettype()
|
|---|
| 157 | params = msg.getplist()
|
|---|
| 158 |
|
|---|
| 159 | data = StringIO.StringIO()
|
|---|
| 160 | if msgtype[:10] == "multipart/":
|
|---|
| 161 |
|
|---|
| 162 | file = multifile.MultiFile(stream)
|
|---|
| 163 | file.push(msg.getparam("boundary"))
|
|---|
| 164 | while file.next():
|
|---|
| 165 | submsg = mimetools.Message(file)
|
|---|
| 166 | try:
|
|---|
| 167 | data = StringIO.StringIO()
|
|---|
| 168 | mimetools.decode(file, data, submsg.getencoding())
|
|---|
| 169 | except ValueError:
|
|---|
| 170 | continue
|
|---|
| 171 | if submsg.gettype() == mimetype:
|
|---|
| 172 | break
|
|---|
| 173 | file.pop()
|
|---|
| 174 | return data.getvalue()
|
|---|
| 175 | \end{verbatim}
|
|---|