| 1 | \section{\module{xml.parsers.expat} ---
|
|---|
| 2 | Fast XML parsing using Expat}
|
|---|
| 3 |
|
|---|
| 4 | % Markup notes:
|
|---|
| 5 | %
|
|---|
| 6 | % Many of the attributes of the XMLParser objects are callbacks.
|
|---|
| 7 | % Since signature information must be presented, these are described
|
|---|
| 8 | % using the methoddesc environment. Since they are attributes which
|
|---|
| 9 | % are set by client code, in-text references to these attributes
|
|---|
| 10 | % should be marked using the \member macro and should not include the
|
|---|
| 11 | % parentheses used when marking functions and methods.
|
|---|
| 12 |
|
|---|
| 13 | \declaremodule{standard}{xml.parsers.expat}
|
|---|
| 14 | \modulesynopsis{An interface to the Expat non-validating XML parser.}
|
|---|
| 15 | \moduleauthor{Paul Prescod}{[email protected]}
|
|---|
| 16 |
|
|---|
| 17 | \versionadded{2.0}
|
|---|
| 18 |
|
|---|
| 19 | The \module{xml.parsers.expat} module is a Python interface to the
|
|---|
| 20 | Expat\index{Expat} non-validating XML parser.
|
|---|
| 21 | The module provides a single extension type, \class{xmlparser}, that
|
|---|
| 22 | represents the current state of an XML parser. After an
|
|---|
| 23 | \class{xmlparser} object has been created, various attributes of the object
|
|---|
| 24 | can be set to handler functions. When an XML document is then fed to
|
|---|
| 25 | the parser, the handler functions are called for the character data
|
|---|
| 26 | and markup in the XML document.
|
|---|
| 27 |
|
|---|
| 28 | This module uses the \module{pyexpat}\refbimodindex{pyexpat} module to
|
|---|
| 29 | provide access to the Expat parser. Direct use of the
|
|---|
| 30 | \module{pyexpat} module is deprecated.
|
|---|
| 31 |
|
|---|
| 32 | This module provides one exception and one type object:
|
|---|
| 33 |
|
|---|
| 34 | \begin{excdesc}{ExpatError}
|
|---|
| 35 | The exception raised when Expat reports an error. See section
|
|---|
| 36 | \ref{expaterror-objects}, ``ExpatError Exceptions,'' for more
|
|---|
| 37 | information on interpreting Expat errors.
|
|---|
| 38 | \end{excdesc}
|
|---|
| 39 |
|
|---|
| 40 | \begin{excdesc}{error}
|
|---|
| 41 | Alias for \exception{ExpatError}.
|
|---|
| 42 | \end{excdesc}
|
|---|
| 43 |
|
|---|
| 44 | \begin{datadesc}{XMLParserType}
|
|---|
| 45 | The type of the return values from the \function{ParserCreate()}
|
|---|
| 46 | function.
|
|---|
| 47 | \end{datadesc}
|
|---|
| 48 |
|
|---|
| 49 |
|
|---|
| 50 | The \module{xml.parsers.expat} module contains two functions:
|
|---|
| 51 |
|
|---|
| 52 | \begin{funcdesc}{ErrorString}{errno}
|
|---|
| 53 | Returns an explanatory string for a given error number \var{errno}.
|
|---|
| 54 | \end{funcdesc}
|
|---|
| 55 |
|
|---|
| 56 | \begin{funcdesc}{ParserCreate}{\optional{encoding\optional{,
|
|---|
| 57 | namespace_separator}}}
|
|---|
| 58 | Creates and returns a new \class{xmlparser} object.
|
|---|
| 59 | \var{encoding}, if specified, must be a string naming the encoding
|
|---|
| 60 | used by the XML data. Expat doesn't support as many encodings as
|
|---|
| 61 | Python does, and its repertoire of encodings can't be extended; it
|
|---|
| 62 | supports UTF-8, UTF-16, ISO-8859-1 (Latin1), and ASCII. If
|
|---|
| 63 | \var{encoding} is given it will override the implicit or explicit
|
|---|
| 64 | encoding of the document.
|
|---|
| 65 |
|
|---|
| 66 | Expat can optionally do XML namespace processing for you, enabled by
|
|---|
| 67 | providing a value for \var{namespace_separator}. The value must be a
|
|---|
| 68 | one-character string; a \exception{ValueError} will be raised if the
|
|---|
| 69 | string has an illegal length (\code{None} is considered the same as
|
|---|
| 70 | omission). When namespace processing is enabled, element type names
|
|---|
| 71 | and attribute names that belong to a namespace will be expanded. The
|
|---|
| 72 | element name passed to the element handlers
|
|---|
| 73 | \member{StartElementHandler} and \member{EndElementHandler}
|
|---|
| 74 | will be the concatenation of the namespace URI, the namespace
|
|---|
| 75 | separator character, and the local part of the name. If the namespace
|
|---|
| 76 | separator is a zero byte (\code{chr(0)}) then the namespace URI and
|
|---|
| 77 | the local part will be concatenated without any separator.
|
|---|
| 78 |
|
|---|
| 79 | For example, if \var{namespace_separator} is set to a space character
|
|---|
| 80 | (\character{ }) and the following document is parsed:
|
|---|
| 81 |
|
|---|
| 82 | \begin{verbatim}
|
|---|
| 83 | <?xml version="1.0"?>
|
|---|
| 84 | <root xmlns = "http://default-namespace.org/"
|
|---|
| 85 | xmlns:py = "http://www.python.org/ns/">
|
|---|
| 86 | <py:elem1 />
|
|---|
| 87 | <elem2 xmlns="" />
|
|---|
| 88 | </root>
|
|---|
| 89 | \end{verbatim}
|
|---|
| 90 |
|
|---|
| 91 | \member{StartElementHandler} will receive the following strings
|
|---|
| 92 | for each element:
|
|---|
| 93 |
|
|---|
| 94 | \begin{verbatim}
|
|---|
| 95 | http://default-namespace.org/ root
|
|---|
| 96 | http://www.python.org/ns/ elem1
|
|---|
| 97 | elem2
|
|---|
| 98 | \end{verbatim}
|
|---|
| 99 | \end{funcdesc}
|
|---|
| 100 |
|
|---|
| 101 |
|
|---|
| 102 | \begin{seealso}
|
|---|
| 103 | \seetitle[http://www.libexpat.org/]{The Expat XML Parser}
|
|---|
| 104 | {Home page of the Expat project.}
|
|---|
| 105 | \end{seealso}
|
|---|
| 106 |
|
|---|
| 107 |
|
|---|
| 108 | \subsection{XMLParser Objects \label{xmlparser-objects}}
|
|---|
| 109 |
|
|---|
| 110 | \class{xmlparser} objects have the following methods:
|
|---|
| 111 |
|
|---|
| 112 | \begin{methoddesc}[xmlparser]{Parse}{data\optional{, isfinal}}
|
|---|
| 113 | Parses the contents of the string \var{data}, calling the appropriate
|
|---|
| 114 | handler functions to process the parsed data. \var{isfinal} must be
|
|---|
| 115 | true on the final call to this method. \var{data} can be the empty
|
|---|
| 116 | string at any time.
|
|---|
| 117 | \end{methoddesc}
|
|---|
| 118 |
|
|---|
| 119 | \begin{methoddesc}[xmlparser]{ParseFile}{file}
|
|---|
| 120 | Parse XML data reading from the object \var{file}. \var{file} only
|
|---|
| 121 | needs to provide the \method{read(\var{nbytes})} method, returning the
|
|---|
| 122 | empty string when there's no more data.
|
|---|
| 123 | \end{methoddesc}
|
|---|
| 124 |
|
|---|
| 125 | \begin{methoddesc}[xmlparser]{SetBase}{base}
|
|---|
| 126 | Sets the base to be used for resolving relative URIs in system
|
|---|
| 127 | identifiers in declarations. Resolving relative identifiers is left
|
|---|
| 128 | to the application: this value will be passed through as the
|
|---|
| 129 | \var{base} argument to the \function{ExternalEntityRefHandler},
|
|---|
| 130 | \function{NotationDeclHandler}, and
|
|---|
| 131 | \function{UnparsedEntityDeclHandler} functions.
|
|---|
| 132 | \end{methoddesc}
|
|---|
| 133 |
|
|---|
| 134 | \begin{methoddesc}[xmlparser]{GetBase}{}
|
|---|
| 135 | Returns a string containing the base set by a previous call to
|
|---|
| 136 | \method{SetBase()}, or \code{None} if
|
|---|
| 137 | \method{SetBase()} hasn't been called.
|
|---|
| 138 | \end{methoddesc}
|
|---|
| 139 |
|
|---|
| 140 | \begin{methoddesc}[xmlparser]{GetInputContext}{}
|
|---|
| 141 | Returns the input data that generated the current event as a string.
|
|---|
| 142 | The data is in the encoding of the entity which contains the text.
|
|---|
| 143 | When called while an event handler is not active, the return value is
|
|---|
| 144 | \code{None}.
|
|---|
| 145 | \versionadded{2.1}
|
|---|
| 146 | \end{methoddesc}
|
|---|
| 147 |
|
|---|
| 148 | \begin{methoddesc}[xmlparser]{ExternalEntityParserCreate}{context\optional{,
|
|---|
| 149 | encoding}}
|
|---|
| 150 | Create a ``child'' parser which can be used to parse an external
|
|---|
| 151 | parsed entity referred to by content parsed by the parent parser. The
|
|---|
| 152 | \var{context} parameter should be the string passed to the
|
|---|
| 153 | \method{ExternalEntityRefHandler()} handler function, described below.
|
|---|
| 154 | The child parser is created with the \member{ordered_attributes},
|
|---|
| 155 | \member{returns_unicode} and \member{specified_attributes} set to the
|
|---|
| 156 | values of this parser.
|
|---|
| 157 | \end{methoddesc}
|
|---|
| 158 |
|
|---|
| 159 | \begin{methoddesc}[xmlparser]{UseForeignDTD}{\optional{flag}}
|
|---|
| 160 | Calling this with a true value for \var{flag} (the default) will cause
|
|---|
| 161 | Expat to call the \member{ExternalEntityRefHandler} with
|
|---|
| 162 | \constant{None} for all arguments to allow an alternate DTD to be
|
|---|
| 163 | loaded. If the document does not contain a document type declaration,
|
|---|
| 164 | the \member{ExternalEntityRefHandler} will still be called, but the
|
|---|
| 165 | \member{StartDoctypeDeclHandler} and \member{EndDoctypeDeclHandler}
|
|---|
| 166 | will not be called.
|
|---|
| 167 |
|
|---|
| 168 | Passing a false value for \var{flag} will cancel a previous call that
|
|---|
| 169 | passed a true value, but otherwise has no effect.
|
|---|
| 170 |
|
|---|
| 171 | This method can only be called before the \method{Parse()} or
|
|---|
| 172 | \method{ParseFile()} methods are called; calling it after either of
|
|---|
| 173 | those have been called causes \exception{ExpatError} to be raised with
|
|---|
| 174 | the \member{code} attribute set to
|
|---|
| 175 | \constant{errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING}.
|
|---|
| 176 |
|
|---|
| 177 | \versionadded{2.3}
|
|---|
| 178 | \end{methoddesc}
|
|---|
| 179 |
|
|---|
| 180 |
|
|---|
| 181 | \class{xmlparser} objects have the following attributes:
|
|---|
| 182 |
|
|---|
| 183 | \begin{memberdesc}[xmlparser]{buffer_size}
|
|---|
| 184 | The size of the buffer used when \member{buffer_text} is true. This
|
|---|
| 185 | value cannot be changed at this time.
|
|---|
| 186 | \versionadded{2.3}
|
|---|
| 187 | \end{memberdesc}
|
|---|
| 188 |
|
|---|
| 189 | \begin{memberdesc}[xmlparser]{buffer_text}
|
|---|
| 190 | Setting this to true causes the \class{xmlparser} object to buffer
|
|---|
| 191 | textual content returned by Expat to avoid multiple calls to the
|
|---|
| 192 | \method{CharacterDataHandler()} callback whenever possible. This can
|
|---|
| 193 | improve performance substantially since Expat normally breaks
|
|---|
| 194 | character data into chunks at every line ending. This attribute is
|
|---|
| 195 | false by default, and may be changed at any time.
|
|---|
| 196 | \versionadded{2.3}
|
|---|
| 197 | \end{memberdesc}
|
|---|
| 198 |
|
|---|
| 199 | \begin{memberdesc}[xmlparser]{buffer_used}
|
|---|
| 200 | If \member{buffer_text} is enabled, the number of bytes stored in the
|
|---|
| 201 | buffer. These bytes represent UTF-8 encoded text. This attribute has
|
|---|
| 202 | no meaningful interpretation when \member{buffer_text} is false.
|
|---|
| 203 | \versionadded{2.3}
|
|---|
| 204 | \end{memberdesc}
|
|---|
| 205 |
|
|---|
| 206 | \begin{memberdesc}[xmlparser]{ordered_attributes}
|
|---|
| 207 | Setting this attribute to a non-zero integer causes the attributes to
|
|---|
| 208 | be reported as a list rather than a dictionary. The attributes are
|
|---|
| 209 | presented in the order found in the document text. For each
|
|---|
| 210 | attribute, two list entries are presented: the attribute name and the
|
|---|
| 211 | attribute value. (Older versions of this module also used this
|
|---|
| 212 | format.) By default, this attribute is false; it may be changed at
|
|---|
| 213 | any time.
|
|---|
| 214 | \versionadded{2.1}
|
|---|
| 215 | \end{memberdesc}
|
|---|
| 216 |
|
|---|
| 217 | \begin{memberdesc}[xmlparser]{returns_unicode}
|
|---|
| 218 | If this attribute is set to a non-zero integer, the handler functions
|
|---|
| 219 | will be passed Unicode strings. If \member{returns_unicode} is
|
|---|
| 220 | \constant{False}, 8-bit strings containing UTF-8 encoded data will be
|
|---|
| 221 | passed to the handlers. This is \constant{True} by default when
|
|---|
| 222 | Python is built with Unicode support.
|
|---|
| 223 | \versionchanged[Can be changed at any time to affect the result
|
|---|
| 224 | type]{1.6}
|
|---|
| 225 | \end{memberdesc}
|
|---|
| 226 |
|
|---|
| 227 | \begin{memberdesc}[xmlparser]{specified_attributes}
|
|---|
| 228 | If set to a non-zero integer, the parser will report only those
|
|---|
| 229 | attributes which were specified in the document instance and not those
|
|---|
| 230 | which were derived from attribute declarations. Applications which
|
|---|
| 231 | set this need to be especially careful to use what additional
|
|---|
| 232 | information is available from the declarations as needed to comply
|
|---|
| 233 | with the standards for the behavior of XML processors. By default,
|
|---|
| 234 | this attribute is false; it may be changed at any time.
|
|---|
| 235 | \versionadded{2.1}
|
|---|
| 236 | \end{memberdesc}
|
|---|
| 237 |
|
|---|
| 238 | The following attributes contain values relating to the most recent
|
|---|
| 239 | error encountered by an \class{xmlparser} object, and will only have
|
|---|
| 240 | correct values once a call to \method{Parse()} or \method{ParseFile()}
|
|---|
| 241 | has raised a \exception{xml.parsers.expat.ExpatError} exception.
|
|---|
| 242 |
|
|---|
| 243 | \begin{memberdesc}[xmlparser]{ErrorByteIndex}
|
|---|
| 244 | Byte index at which an error occurred.
|
|---|
| 245 | \end{memberdesc}
|
|---|
| 246 |
|
|---|
| 247 | \begin{memberdesc}[xmlparser]{ErrorCode}
|
|---|
| 248 | Numeric code specifying the problem. This value can be passed to the
|
|---|
| 249 | \function{ErrorString()} function, or compared to one of the constants
|
|---|
| 250 | defined in the \code{errors} object.
|
|---|
| 251 | \end{memberdesc}
|
|---|
| 252 |
|
|---|
| 253 | \begin{memberdesc}[xmlparser]{ErrorColumnNumber}
|
|---|
| 254 | Column number at which an error occurred.
|
|---|
| 255 | \end{memberdesc}
|
|---|
| 256 |
|
|---|
| 257 | \begin{memberdesc}[xmlparser]{ErrorLineNumber}
|
|---|
| 258 | Line number at which an error occurred.
|
|---|
| 259 | \end{memberdesc}
|
|---|
| 260 |
|
|---|
| 261 | The following attributes contain values relating to the current parse
|
|---|
| 262 | location in an \class{xmlparser} object. During a callback reporting
|
|---|
| 263 | a parse event they indicate the location of the first of the sequence
|
|---|
| 264 | of characters that generated the event. When called outside of a
|
|---|
| 265 | callback, the position indicated will be just past the last parse
|
|---|
| 266 | event (regardless of whether there was an associated callback).
|
|---|
| 267 | \versionadded{2.4}
|
|---|
| 268 |
|
|---|
| 269 | \begin{memberdesc}[xmlparser]{CurrentByteIndex}
|
|---|
| 270 | Current byte index in the parser input.
|
|---|
| 271 | \end{memberdesc}
|
|---|
| 272 |
|
|---|
| 273 | \begin{memberdesc}[xmlparser]{CurrentColumnNumber}
|
|---|
| 274 | Current column number in the parser input.
|
|---|
| 275 | \end{memberdesc}
|
|---|
| 276 |
|
|---|
| 277 | \begin{memberdesc}[xmlparser]{CurrentLineNumber}
|
|---|
| 278 | Current line number in the parser input.
|
|---|
| 279 | \end{memberdesc}
|
|---|
| 280 |
|
|---|
| 281 | Here is the list of handlers that can be set. To set a handler on an
|
|---|
| 282 | \class{xmlparser} object \var{o}, use
|
|---|
| 283 | \code{\var{o}.\var{handlername} = \var{func}}. \var{handlername} must
|
|---|
| 284 | be taken from the following list, and \var{func} must be a callable
|
|---|
| 285 | object accepting the correct number of arguments. The arguments are
|
|---|
| 286 | all strings, unless otherwise stated.
|
|---|
| 287 |
|
|---|
| 288 | \begin{methoddesc}[xmlparser]{XmlDeclHandler}{version, encoding, standalone}
|
|---|
| 289 | Called when the XML declaration is parsed. The XML declaration is the
|
|---|
| 290 | (optional) declaration of the applicable version of the XML
|
|---|
| 291 | recommendation, the encoding of the document text, and an optional
|
|---|
| 292 | ``standalone'' declaration. \var{version} and \var{encoding} will be
|
|---|
| 293 | strings of the type dictated by the \member{returns_unicode}
|
|---|
| 294 | attribute, and \var{standalone} will be \code{1} if the document is
|
|---|
| 295 | declared standalone, \code{0} if it is declared not to be standalone,
|
|---|
| 296 | or \code{-1} if the standalone clause was omitted.
|
|---|
| 297 | This is only available with Expat version 1.95.0 or newer.
|
|---|
| 298 | \versionadded{2.1}
|
|---|
| 299 | \end{methoddesc}
|
|---|
| 300 |
|
|---|
| 301 | \begin{methoddesc}[xmlparser]{StartDoctypeDeclHandler}{doctypeName,
|
|---|
| 302 | systemId, publicId,
|
|---|
| 303 | has_internal_subset}
|
|---|
| 304 | Called when Expat begins parsing the document type declaration
|
|---|
| 305 | (\code{<!DOCTYPE \ldots}). The \var{doctypeName} is provided exactly
|
|---|
| 306 | as presented. The \var{systemId} and \var{publicId} parameters give
|
|---|
| 307 | the system and public identifiers if specified, or \code{None} if
|
|---|
| 308 | omitted. \var{has_internal_subset} will be true if the document
|
|---|
| 309 | contains and internal document declaration subset.
|
|---|
| 310 | This requires Expat version 1.2 or newer.
|
|---|
| 311 | \end{methoddesc}
|
|---|
| 312 |
|
|---|
| 313 | \begin{methoddesc}[xmlparser]{EndDoctypeDeclHandler}{}
|
|---|
| 314 | Called when Expat is done parsing the document type declaration.
|
|---|
| 315 | This requires Expat version 1.2 or newer.
|
|---|
| 316 | \end{methoddesc}
|
|---|
| 317 |
|
|---|
| 318 | \begin{methoddesc}[xmlparser]{ElementDeclHandler}{name, model}
|
|---|
| 319 | Called once for each element type declaration. \var{name} is the name
|
|---|
| 320 | of the element type, and \var{model} is a representation of the
|
|---|
| 321 | content model.
|
|---|
| 322 | \end{methoddesc}
|
|---|
| 323 |
|
|---|
| 324 | \begin{methoddesc}[xmlparser]{AttlistDeclHandler}{elname, attname,
|
|---|
| 325 | type, default, required}
|
|---|
| 326 | Called for each declared attribute for an element type. If an
|
|---|
| 327 | attribute list declaration declares three attributes, this handler is
|
|---|
| 328 | called three times, once for each attribute. \var{elname} is the name
|
|---|
| 329 | of the element to which the declaration applies and \var{attname} is
|
|---|
| 330 | the name of the attribute declared. The attribute type is a string
|
|---|
| 331 | passed as \var{type}; the possible values are \code{'CDATA'},
|
|---|
| 332 | \code{'ID'}, \code{'IDREF'}, ...
|
|---|
| 333 | \var{default} gives the default value for the attribute used when the
|
|---|
| 334 | attribute is not specified by the document instance, or \code{None} if
|
|---|
| 335 | there is no default value (\code{\#IMPLIED} values). If the attribute
|
|---|
| 336 | is required to be given in the document instance, \var{required} will
|
|---|
| 337 | be true.
|
|---|
| 338 | This requires Expat version 1.95.0 or newer.
|
|---|
| 339 | \end{methoddesc}
|
|---|
| 340 |
|
|---|
| 341 | \begin{methoddesc}[xmlparser]{StartElementHandler}{name, attributes}
|
|---|
| 342 | Called for the start of every element. \var{name} is a string
|
|---|
| 343 | containing the element name, and \var{attributes} is a dictionary
|
|---|
| 344 | mapping attribute names to their values.
|
|---|
| 345 | \end{methoddesc}
|
|---|
| 346 |
|
|---|
| 347 | \begin{methoddesc}[xmlparser]{EndElementHandler}{name}
|
|---|
| 348 | Called for the end of every element.
|
|---|
| 349 | \end{methoddesc}
|
|---|
| 350 |
|
|---|
| 351 | \begin{methoddesc}[xmlparser]{ProcessingInstructionHandler}{target, data}
|
|---|
| 352 | Called for every processing instruction.
|
|---|
| 353 | \end{methoddesc}
|
|---|
| 354 |
|
|---|
| 355 | \begin{methoddesc}[xmlparser]{CharacterDataHandler}{data}
|
|---|
| 356 | Called for character data. This will be called for normal character
|
|---|
| 357 | data, CDATA marked content, and ignorable whitespace. Applications
|
|---|
| 358 | which must distinguish these cases can use the
|
|---|
| 359 | \member{StartCdataSectionHandler}, \member{EndCdataSectionHandler},
|
|---|
| 360 | and \member{ElementDeclHandler} callbacks to collect the required
|
|---|
| 361 | information.
|
|---|
| 362 | \end{methoddesc}
|
|---|
| 363 |
|
|---|
| 364 | \begin{methoddesc}[xmlparser]{UnparsedEntityDeclHandler}{entityName, base,
|
|---|
| 365 | systemId, publicId,
|
|---|
| 366 | notationName}
|
|---|
| 367 | Called for unparsed (NDATA) entity declarations. This is only present
|
|---|
| 368 | for version 1.2 of the Expat library; for more recent versions, use
|
|---|
| 369 | \member{EntityDeclHandler} instead. (The underlying function in the
|
|---|
| 370 | Expat library has been declared obsolete.)
|
|---|
| 371 | \end{methoddesc}
|
|---|
| 372 |
|
|---|
| 373 | \begin{methoddesc}[xmlparser]{EntityDeclHandler}{entityName,
|
|---|
| 374 | is_parameter_entity, value,
|
|---|
| 375 | base, systemId,
|
|---|
| 376 | publicId,
|
|---|
| 377 | notationName}
|
|---|
| 378 | Called for all entity declarations. For parameter and internal
|
|---|
| 379 | entities, \var{value} will be a string giving the declared contents
|
|---|
| 380 | of the entity; this will be \code{None} for external entities. The
|
|---|
| 381 | \var{notationName} parameter will be \code{None} for parsed entities,
|
|---|
| 382 | and the name of the notation for unparsed entities.
|
|---|
| 383 | \var{is_parameter_entity} will be true if the entity is a parameter
|
|---|
| 384 | entity or false for general entities (most applications only need to
|
|---|
| 385 | be concerned with general entities).
|
|---|
| 386 | This is only available starting with version 1.95.0 of the Expat
|
|---|
| 387 | library.
|
|---|
| 388 | \versionadded{2.1}
|
|---|
| 389 | \end{methoddesc}
|
|---|
| 390 |
|
|---|
| 391 | \begin{methoddesc}[xmlparser]{NotationDeclHandler}{notationName, base,
|
|---|
| 392 | systemId, publicId}
|
|---|
| 393 | Called for notation declarations. \var{notationName}, \var{base}, and
|
|---|
| 394 | \var{systemId}, and \var{publicId} are strings if given. If the
|
|---|
| 395 | public identifier is omitted, \var{publicId} will be \code{None}.
|
|---|
| 396 | \end{methoddesc}
|
|---|
| 397 |
|
|---|
| 398 | \begin{methoddesc}[xmlparser]{StartNamespaceDeclHandler}{prefix, uri}
|
|---|
| 399 | Called when an element contains a namespace declaration. Namespace
|
|---|
| 400 | declarations are processed before the \member{StartElementHandler} is
|
|---|
| 401 | called for the element on which declarations are placed.
|
|---|
| 402 | \end{methoddesc}
|
|---|
| 403 |
|
|---|
| 404 | \begin{methoddesc}[xmlparser]{EndNamespaceDeclHandler}{prefix}
|
|---|
| 405 | Called when the closing tag is reached for an element
|
|---|
| 406 | that contained a namespace declaration. This is called once for each
|
|---|
| 407 | namespace declaration on the element in the reverse of the order for
|
|---|
| 408 | which the \member{StartNamespaceDeclHandler} was called to indicate
|
|---|
| 409 | the start of each namespace declaration's scope. Calls to this
|
|---|
| 410 | handler are made after the corresponding \member{EndElementHandler}
|
|---|
| 411 | for the end of the element.
|
|---|
| 412 | \end{methoddesc}
|
|---|
| 413 |
|
|---|
| 414 | \begin{methoddesc}[xmlparser]{CommentHandler}{data}
|
|---|
| 415 | Called for comments. \var{data} is the text of the comment, excluding
|
|---|
| 416 | the leading `\code{<!-}\code{-}' and trailing `\code{-}\code{->}'.
|
|---|
| 417 | \end{methoddesc}
|
|---|
| 418 |
|
|---|
| 419 | \begin{methoddesc}[xmlparser]{StartCdataSectionHandler}{}
|
|---|
| 420 | Called at the start of a CDATA section. This and
|
|---|
| 421 | \member{EndCdataSectionHandler} are needed to be able to identify
|
|---|
| 422 | the syntactical start and end for CDATA sections.
|
|---|
| 423 | \end{methoddesc}
|
|---|
| 424 |
|
|---|
| 425 | \begin{methoddesc}[xmlparser]{EndCdataSectionHandler}{}
|
|---|
| 426 | Called at the end of a CDATA section.
|
|---|
| 427 | \end{methoddesc}
|
|---|
| 428 |
|
|---|
| 429 | \begin{methoddesc}[xmlparser]{DefaultHandler}{data}
|
|---|
| 430 | Called for any characters in the XML document for
|
|---|
| 431 | which no applicable handler has been specified. This means
|
|---|
| 432 | characters that are part of a construct which could be reported, but
|
|---|
| 433 | for which no handler has been supplied.
|
|---|
| 434 | \end{methoddesc}
|
|---|
| 435 |
|
|---|
| 436 | \begin{methoddesc}[xmlparser]{DefaultHandlerExpand}{data}
|
|---|
| 437 | This is the same as the \function{DefaultHandler},
|
|---|
| 438 | but doesn't inhibit expansion of internal entities.
|
|---|
| 439 | The entity reference will not be passed to the default handler.
|
|---|
| 440 | \end{methoddesc}
|
|---|
| 441 |
|
|---|
| 442 | \begin{methoddesc}[xmlparser]{NotStandaloneHandler}{} Called if the
|
|---|
| 443 | XML document hasn't been declared as being a standalone document.
|
|---|
| 444 | This happens when there is an external subset or a reference to a
|
|---|
| 445 | parameter entity, but the XML declaration does not set standalone to
|
|---|
| 446 | \code{yes} in an XML declaration. If this handler returns \code{0},
|
|---|
| 447 | then the parser will throw an \constant{XML_ERROR_NOT_STANDALONE}
|
|---|
| 448 | error. If this handler is not set, no exception is raised by the
|
|---|
| 449 | parser for this condition.
|
|---|
| 450 | \end{methoddesc}
|
|---|
| 451 |
|
|---|
| 452 | \begin{methoddesc}[xmlparser]{ExternalEntityRefHandler}{context, base,
|
|---|
| 453 | systemId, publicId}
|
|---|
| 454 | Called for references to external entities. \var{base} is the current
|
|---|
| 455 | base, as set by a previous call to \method{SetBase()}. The public and
|
|---|
| 456 | system identifiers, \var{systemId} and \var{publicId}, are strings if
|
|---|
| 457 | given; if the public identifier is not given, \var{publicId} will be
|
|---|
| 458 | \code{None}. The \var{context} value is opaque and should only be
|
|---|
| 459 | used as described below.
|
|---|
| 460 |
|
|---|
| 461 | For external entities to be parsed, this handler must be implemented.
|
|---|
| 462 | It is responsible for creating the sub-parser using
|
|---|
| 463 | \code{ExternalEntityParserCreate(\var{context})}, initializing it with
|
|---|
| 464 | the appropriate callbacks, and parsing the entity. This handler
|
|---|
| 465 | should return an integer; if it returns \code{0}, the parser will
|
|---|
| 466 | throw an \constant{XML_ERROR_EXTERNAL_ENTITY_HANDLING} error,
|
|---|
| 467 | otherwise parsing will continue.
|
|---|
| 468 |
|
|---|
| 469 | If this handler is not provided, external entities are reported by the
|
|---|
| 470 | \member{DefaultHandler} callback, if provided.
|
|---|
| 471 | \end{methoddesc}
|
|---|
| 472 |
|
|---|
| 473 |
|
|---|
| 474 | \subsection{ExpatError Exceptions \label{expaterror-objects}}
|
|---|
| 475 | \sectionauthor{Fred L. Drake, Jr.}{[email protected]}
|
|---|
| 476 |
|
|---|
| 477 | \exception{ExpatError} exceptions have a number of interesting
|
|---|
| 478 | attributes:
|
|---|
| 479 |
|
|---|
| 480 | \begin{memberdesc}[ExpatError]{code}
|
|---|
| 481 | Expat's internal error number for the specific error. This will
|
|---|
| 482 | match one of the constants defined in the \code{errors} object from
|
|---|
| 483 | this module.
|
|---|
| 484 | \versionadded{2.1}
|
|---|
| 485 | \end{memberdesc}
|
|---|
| 486 |
|
|---|
| 487 | \begin{memberdesc}[ExpatError]{lineno}
|
|---|
| 488 | Line number on which the error was detected. The first line is
|
|---|
| 489 | numbered \code{1}.
|
|---|
| 490 | \versionadded{2.1}
|
|---|
| 491 | \end{memberdesc}
|
|---|
| 492 |
|
|---|
| 493 | \begin{memberdesc}[ExpatError]{offset}
|
|---|
| 494 | Character offset into the line where the error occurred. The first
|
|---|
| 495 | column is numbered \code{0}.
|
|---|
| 496 | \versionadded{2.1}
|
|---|
| 497 | \end{memberdesc}
|
|---|
| 498 |
|
|---|
| 499 |
|
|---|
| 500 | \subsection{Example \label{expat-example}}
|
|---|
| 501 |
|
|---|
| 502 | The following program defines three handlers that just print out their
|
|---|
| 503 | arguments.
|
|---|
| 504 |
|
|---|
| 505 | \begin{verbatim}
|
|---|
| 506 | import xml.parsers.expat
|
|---|
| 507 |
|
|---|
| 508 | # 3 handler functions
|
|---|
| 509 | def start_element(name, attrs):
|
|---|
| 510 | print 'Start element:', name, attrs
|
|---|
| 511 | def end_element(name):
|
|---|
| 512 | print 'End element:', name
|
|---|
| 513 | def char_data(data):
|
|---|
| 514 | print 'Character data:', repr(data)
|
|---|
| 515 |
|
|---|
| 516 | p = xml.parsers.expat.ParserCreate()
|
|---|
| 517 |
|
|---|
| 518 | p.StartElementHandler = start_element
|
|---|
| 519 | p.EndElementHandler = end_element
|
|---|
| 520 | p.CharacterDataHandler = char_data
|
|---|
| 521 |
|
|---|
| 522 | p.Parse("""<?xml version="1.0"?>
|
|---|
| 523 | <parent id="top"><child1 name="paul">Text goes here</child1>
|
|---|
| 524 | <child2 name="fred">More text</child2>
|
|---|
| 525 | </parent>""", 1)
|
|---|
| 526 | \end{verbatim}
|
|---|
| 527 |
|
|---|
| 528 | The output from this program is:
|
|---|
| 529 |
|
|---|
| 530 | \begin{verbatim}
|
|---|
| 531 | Start element: parent {'id': 'top'}
|
|---|
| 532 | Start element: child1 {'name': 'paul'}
|
|---|
| 533 | Character data: 'Text goes here'
|
|---|
| 534 | End element: child1
|
|---|
| 535 | Character data: '\n'
|
|---|
| 536 | Start element: child2 {'name': 'fred'}
|
|---|
| 537 | Character data: 'More text'
|
|---|
| 538 | End element: child2
|
|---|
| 539 | Character data: '\n'
|
|---|
| 540 | End element: parent
|
|---|
| 541 | \end{verbatim}
|
|---|
| 542 |
|
|---|
| 543 |
|
|---|
| 544 | \subsection{Content Model Descriptions \label{expat-content-models}}
|
|---|
| 545 | \sectionauthor{Fred L. Drake, Jr.}{[email protected]}
|
|---|
| 546 |
|
|---|
| 547 | Content modules are described using nested tuples. Each tuple
|
|---|
| 548 | contains four values: the type, the quantifier, the name, and a tuple
|
|---|
| 549 | of children. Children are simply additional content module
|
|---|
| 550 | descriptions.
|
|---|
| 551 |
|
|---|
| 552 | The values of the first two fields are constants defined in the
|
|---|
| 553 | \code{model} object of the \module{xml.parsers.expat} module. These
|
|---|
| 554 | constants can be collected in two groups: the model type group and the
|
|---|
| 555 | quantifier group.
|
|---|
| 556 |
|
|---|
| 557 | The constants in the model type group are:
|
|---|
| 558 |
|
|---|
| 559 | \begin{datadescni}{XML_CTYPE_ANY}
|
|---|
| 560 | The element named by the model name was declared to have a content
|
|---|
| 561 | model of \code{ANY}.
|
|---|
| 562 | \end{datadescni}
|
|---|
| 563 |
|
|---|
| 564 | \begin{datadescni}{XML_CTYPE_CHOICE}
|
|---|
| 565 | The named element allows a choice from a number of options; this is
|
|---|
| 566 | used for content models such as \code{(A | B | C)}.
|
|---|
| 567 | \end{datadescni}
|
|---|
| 568 |
|
|---|
| 569 | \begin{datadescni}{XML_CTYPE_EMPTY}
|
|---|
| 570 | Elements which are declared to be \code{EMPTY} have this model type.
|
|---|
| 571 | \end{datadescni}
|
|---|
| 572 |
|
|---|
| 573 | \begin{datadescni}{XML_CTYPE_MIXED}
|
|---|
| 574 | \end{datadescni}
|
|---|
| 575 |
|
|---|
| 576 | \begin{datadescni}{XML_CTYPE_NAME}
|
|---|
| 577 | \end{datadescni}
|
|---|
| 578 |
|
|---|
| 579 | \begin{datadescni}{XML_CTYPE_SEQ}
|
|---|
| 580 | Models which represent a series of models which follow one after the
|
|---|
| 581 | other are indicated with this model type. This is used for models
|
|---|
| 582 | such as \code{(A, B, C)}.
|
|---|
| 583 | \end{datadescni}
|
|---|
| 584 |
|
|---|
| 585 |
|
|---|
| 586 | The constants in the quantifier group are:
|
|---|
| 587 |
|
|---|
| 588 | \begin{datadescni}{XML_CQUANT_NONE}
|
|---|
| 589 | No modifier is given, so it can appear exactly once, as for \code{A}.
|
|---|
| 590 | \end{datadescni}
|
|---|
| 591 |
|
|---|
| 592 | \begin{datadescni}{XML_CQUANT_OPT}
|
|---|
| 593 | The model is optional: it can appear once or not at all, as for
|
|---|
| 594 | \code{A?}.
|
|---|
| 595 | \end{datadescni}
|
|---|
| 596 |
|
|---|
| 597 | \begin{datadescni}{XML_CQUANT_PLUS}
|
|---|
| 598 | The model must occur one or more times (like \code{A+}).
|
|---|
| 599 | \end{datadescni}
|
|---|
| 600 |
|
|---|
| 601 | \begin{datadescni}{XML_CQUANT_REP}
|
|---|
| 602 | The model must occur zero or more times, as for \code{A*}.
|
|---|
| 603 | \end{datadescni}
|
|---|
| 604 |
|
|---|
| 605 |
|
|---|
| 606 | \subsection{Expat error constants \label{expat-errors}}
|
|---|
| 607 |
|
|---|
| 608 | The following constants are provided in the \code{errors} object of
|
|---|
| 609 | the \refmodule{xml.parsers.expat} module. These constants are useful
|
|---|
| 610 | in interpreting some of the attributes of the \exception{ExpatError}
|
|---|
| 611 | exception objects raised when an error has occurred.
|
|---|
| 612 |
|
|---|
| 613 | The \code{errors} object has the following attributes:
|
|---|
| 614 |
|
|---|
| 615 | \begin{datadescni}{XML_ERROR_ASYNC_ENTITY}
|
|---|
| 616 | \end{datadescni}
|
|---|
| 617 |
|
|---|
| 618 | \begin{datadescni}{XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF}
|
|---|
| 619 | An entity reference in an attribute value referred to an external
|
|---|
| 620 | entity instead of an internal entity.
|
|---|
| 621 | \end{datadescni}
|
|---|
| 622 |
|
|---|
| 623 | \begin{datadescni}{XML_ERROR_BAD_CHAR_REF}
|
|---|
| 624 | A character reference referred to a character which is illegal in XML
|
|---|
| 625 | (for example, character \code{0}, or `\code{\&\#0;}').
|
|---|
| 626 | \end{datadescni}
|
|---|
| 627 |
|
|---|
| 628 | \begin{datadescni}{XML_ERROR_BINARY_ENTITY_REF}
|
|---|
| 629 | An entity reference referred to an entity which was declared with a
|
|---|
| 630 | notation, so cannot be parsed.
|
|---|
| 631 | \end{datadescni}
|
|---|
| 632 |
|
|---|
| 633 | \begin{datadescni}{XML_ERROR_DUPLICATE_ATTRIBUTE}
|
|---|
| 634 | An attribute was used more than once in a start tag.
|
|---|
| 635 | \end{datadescni}
|
|---|
| 636 |
|
|---|
| 637 | \begin{datadescni}{XML_ERROR_INCORRECT_ENCODING}
|
|---|
| 638 | \end{datadescni}
|
|---|
| 639 |
|
|---|
| 640 | \begin{datadescni}{XML_ERROR_INVALID_TOKEN}
|
|---|
| 641 | Raised when an input byte could not properly be assigned to a
|
|---|
| 642 | character; for example, a NUL byte (value \code{0}) in a UTF-8 input
|
|---|
| 643 | stream.
|
|---|
| 644 | \end{datadescni}
|
|---|
| 645 |
|
|---|
| 646 | \begin{datadescni}{XML_ERROR_JUNK_AFTER_DOC_ELEMENT}
|
|---|
| 647 | Something other than whitespace occurred after the document element.
|
|---|
| 648 | \end{datadescni}
|
|---|
| 649 |
|
|---|
| 650 | \begin{datadescni}{XML_ERROR_MISPLACED_XML_PI}
|
|---|
| 651 | An XML declaration was found somewhere other than the start of the
|
|---|
| 652 | input data.
|
|---|
| 653 | \end{datadescni}
|
|---|
| 654 |
|
|---|
| 655 | \begin{datadescni}{XML_ERROR_NO_ELEMENTS}
|
|---|
| 656 | The document contains no elements (XML requires all documents to
|
|---|
| 657 | contain exactly one top-level element)..
|
|---|
| 658 | \end{datadescni}
|
|---|
| 659 |
|
|---|
| 660 | \begin{datadescni}{XML_ERROR_NO_MEMORY}
|
|---|
| 661 | Expat was not able to allocate memory internally.
|
|---|
| 662 | \end{datadescni}
|
|---|
| 663 |
|
|---|
| 664 | \begin{datadescni}{XML_ERROR_PARAM_ENTITY_REF}
|
|---|
| 665 | A parameter entity reference was found where it was not allowed.
|
|---|
| 666 | \end{datadescni}
|
|---|
| 667 |
|
|---|
| 668 | \begin{datadescni}{XML_ERROR_PARTIAL_CHAR}
|
|---|
| 669 | An incomplete character was found in the input.
|
|---|
| 670 | \end{datadescni}
|
|---|
| 671 |
|
|---|
| 672 | \begin{datadescni}{XML_ERROR_RECURSIVE_ENTITY_REF}
|
|---|
| 673 | An entity reference contained another reference to the same entity;
|
|---|
| 674 | possibly via a different name, and possibly indirectly.
|
|---|
| 675 | \end{datadescni}
|
|---|
| 676 |
|
|---|
| 677 | \begin{datadescni}{XML_ERROR_SYNTAX}
|
|---|
| 678 | Some unspecified syntax error was encountered.
|
|---|
| 679 | \end{datadescni}
|
|---|
| 680 |
|
|---|
| 681 | \begin{datadescni}{XML_ERROR_TAG_MISMATCH}
|
|---|
| 682 | An end tag did not match the innermost open start tag.
|
|---|
| 683 | \end{datadescni}
|
|---|
| 684 |
|
|---|
| 685 | \begin{datadescni}{XML_ERROR_UNCLOSED_TOKEN}
|
|---|
| 686 | Some token (such as a start tag) was not closed before the end of the
|
|---|
| 687 | stream or the next token was encountered.
|
|---|
| 688 | \end{datadescni}
|
|---|
| 689 |
|
|---|
| 690 | \begin{datadescni}{XML_ERROR_UNDEFINED_ENTITY}
|
|---|
| 691 | A reference was made to a entity which was not defined.
|
|---|
| 692 | \end{datadescni}
|
|---|
| 693 |
|
|---|
| 694 | \begin{datadescni}{XML_ERROR_UNKNOWN_ENCODING}
|
|---|
| 695 | The document encoding is not supported by Expat.
|
|---|
| 696 | \end{datadescni}
|
|---|
| 697 |
|
|---|
| 698 | \begin{datadescni}{XML_ERROR_UNCLOSED_CDATA_SECTION}
|
|---|
| 699 | A CDATA marked section was not closed.
|
|---|
| 700 | \end{datadescni}
|
|---|
| 701 |
|
|---|
| 702 | \begin{datadescni}{XML_ERROR_EXTERNAL_ENTITY_HANDLING}
|
|---|
| 703 | \end{datadescni}
|
|---|
| 704 |
|
|---|
| 705 | \begin{datadescni}{XML_ERROR_NOT_STANDALONE}
|
|---|
| 706 | The parser determined that the document was not ``standalone'' though
|
|---|
| 707 | it declared itself to be in the XML declaration, and the
|
|---|
| 708 | \member{NotStandaloneHandler} was set and returned \code{0}.
|
|---|
| 709 | \end{datadescni}
|
|---|
| 710 |
|
|---|
| 711 | \begin{datadescni}{XML_ERROR_UNEXPECTED_STATE}
|
|---|
| 712 | \end{datadescni}
|
|---|
| 713 |
|
|---|
| 714 | \begin{datadescni}{XML_ERROR_ENTITY_DECLARED_IN_PE}
|
|---|
| 715 | \end{datadescni}
|
|---|
| 716 |
|
|---|
| 717 | \begin{datadescni}{XML_ERROR_FEATURE_REQUIRES_XML_DTD}
|
|---|
| 718 | An operation was requested that requires DTD support to be compiled
|
|---|
| 719 | in, but Expat was configured without DTD support. This should never
|
|---|
| 720 | be reported by a standard build of the \module{xml.parsers.expat}
|
|---|
| 721 | module.
|
|---|
| 722 | \end{datadescni}
|
|---|
| 723 |
|
|---|
| 724 | \begin{datadescni}{XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING}
|
|---|
| 725 | A behavioral change was requested after parsing started that can only
|
|---|
| 726 | be changed before parsing has started. This is (currently) only
|
|---|
| 727 | raised by \method{UseForeignDTD()}.
|
|---|
| 728 | \end{datadescni}
|
|---|
| 729 |
|
|---|
| 730 | \begin{datadescni}{XML_ERROR_UNBOUND_PREFIX}
|
|---|
| 731 | An undeclared prefix was found when namespace processing was enabled.
|
|---|
| 732 | \end{datadescni}
|
|---|
| 733 |
|
|---|
| 734 | \begin{datadescni}{XML_ERROR_UNDECLARING_PREFIX}
|
|---|
| 735 | The document attempted to remove the namespace declaration associated
|
|---|
| 736 | with a prefix.
|
|---|
| 737 | \end{datadescni}
|
|---|
| 738 |
|
|---|
| 739 | \begin{datadescni}{XML_ERROR_INCOMPLETE_PE}
|
|---|
| 740 | A parameter entity contained incomplete markup.
|
|---|
| 741 | \end{datadescni}
|
|---|
| 742 |
|
|---|
| 743 | \begin{datadescni}{XML_ERROR_XML_DECL}
|
|---|
| 744 | The document contained no document element at all.
|
|---|
| 745 | \end{datadescni}
|
|---|
| 746 |
|
|---|
| 747 | \begin{datadescni}{XML_ERROR_TEXT_DECL}
|
|---|
| 748 | There was an error parsing a text declaration in an external entity.
|
|---|
| 749 | \end{datadescni}
|
|---|
| 750 |
|
|---|
| 751 | \begin{datadescni}{XML_ERROR_PUBLICID}
|
|---|
| 752 | Characters were found in the public id that are not allowed.
|
|---|
| 753 | \end{datadescni}
|
|---|
| 754 |
|
|---|
| 755 | \begin{datadescni}{XML_ERROR_SUSPENDED}
|
|---|
| 756 | The requested operation was made on a suspended parser, but isn't
|
|---|
| 757 | allowed. This includes attempts to provide additional input or to
|
|---|
| 758 | stop the parser.
|
|---|
| 759 | \end{datadescni}
|
|---|
| 760 |
|
|---|
| 761 | \begin{datadescni}{XML_ERROR_NOT_SUSPENDED}
|
|---|
| 762 | An attempt to resume the parser was made when the parser had not been
|
|---|
| 763 | suspended.
|
|---|
| 764 | \end{datadescni}
|
|---|
| 765 |
|
|---|
| 766 | \begin{datadescni}{XML_ERROR_ABORTED}
|
|---|
| 767 | This should not be reported to Python applications.
|
|---|
| 768 | \end{datadescni}
|
|---|
| 769 |
|
|---|
| 770 | \begin{datadescni}{XML_ERROR_FINISHED}
|
|---|
| 771 | The requested operation was made on a parser which was finished
|
|---|
| 772 | parsing input, but isn't allowed. This includes attempts to provide
|
|---|
| 773 | additional input or to stop the parser.
|
|---|
| 774 | \end{datadescni}
|
|---|
| 775 |
|
|---|
| 776 | \begin{datadescni}{XML_ERROR_SUSPEND_PE}
|
|---|
| 777 | \end{datadescni}
|
|---|