| 1 | \chapter{Python compiler package \label{compiler}}
|
|---|
| 2 |
|
|---|
| 3 | \sectionauthor{Jeremy Hylton}{[email protected]}
|
|---|
| 4 |
|
|---|
| 5 |
|
|---|
| 6 | The Python compiler package is a tool for analyzing Python source code
|
|---|
| 7 | and generating Python bytecode. The compiler contains libraries to
|
|---|
| 8 | generate an abstract syntax tree from Python source code and to
|
|---|
| 9 | generate Python bytecode from the tree.
|
|---|
| 10 |
|
|---|
| 11 | The \refmodule{compiler} package is a Python source to bytecode
|
|---|
| 12 | translator written in Python. It uses the built-in parser and
|
|---|
| 13 | standard \refmodule{parser} module to generated a concrete syntax
|
|---|
| 14 | tree. This tree is used to generate an abstract syntax tree (AST) and
|
|---|
| 15 | then Python bytecode.
|
|---|
| 16 |
|
|---|
| 17 | The full functionality of the package duplicates the builtin compiler
|
|---|
| 18 | provided with the Python interpreter. It is intended to match its
|
|---|
| 19 | behavior almost exactly. Why implement another compiler that does the
|
|---|
| 20 | same thing? The package is useful for a variety of purposes. It can
|
|---|
| 21 | be modified more easily than the builtin compiler. The AST it
|
|---|
| 22 | generates is useful for analyzing Python source code.
|
|---|
| 23 |
|
|---|
| 24 | This chapter explains how the various components of the
|
|---|
| 25 | \refmodule{compiler} package work. It blends reference material with
|
|---|
| 26 | a tutorial.
|
|---|
| 27 |
|
|---|
| 28 | The following modules are part of the \refmodule{compiler} package:
|
|---|
| 29 |
|
|---|
| 30 | \localmoduletable
|
|---|
| 31 |
|
|---|
| 32 |
|
|---|
| 33 | \section{The basic interface}
|
|---|
| 34 |
|
|---|
| 35 | \declaremodule{}{compiler}
|
|---|
| 36 |
|
|---|
| 37 | The top-level of the package defines four functions. If you import
|
|---|
| 38 | \module{compiler}, you will get these functions and a collection of
|
|---|
| 39 | modules contained in the package.
|
|---|
| 40 |
|
|---|
| 41 | \begin{funcdesc}{parse}{buf}
|
|---|
| 42 | Returns an abstract syntax tree for the Python source code in \var{buf}.
|
|---|
| 43 | The function raises \exception{SyntaxError} if there is an error in the
|
|---|
| 44 | source code. The return value is a \class{compiler.ast.Module} instance
|
|---|
| 45 | that contains the tree.
|
|---|
| 46 | \end{funcdesc}
|
|---|
| 47 |
|
|---|
| 48 | \begin{funcdesc}{parseFile}{path}
|
|---|
| 49 | Return an abstract syntax tree for the Python source code in the file
|
|---|
| 50 | specified by \var{path}. It is equivalent to
|
|---|
| 51 | \code{parse(open(\var{path}).read())}.
|
|---|
| 52 | \end{funcdesc}
|
|---|
| 53 |
|
|---|
| 54 | \begin{funcdesc}{walk}{ast, visitor\optional{, verbose}}
|
|---|
| 55 | Do a pre-order walk over the abstract syntax tree \var{ast}. Call the
|
|---|
| 56 | appropriate method on the \var{visitor} instance for each node
|
|---|
| 57 | encountered.
|
|---|
| 58 | \end{funcdesc}
|
|---|
| 59 |
|
|---|
| 60 | \begin{funcdesc}{compile}{source, filename, mode, flags=None,
|
|---|
| 61 | dont_inherit=None}
|
|---|
| 62 | Compile the string \var{source}, a Python module, statement or
|
|---|
| 63 | expression, into a code object that can be executed by the exec
|
|---|
| 64 | statement or \function{eval()}. This function is a replacement for the
|
|---|
| 65 | built-in \function{compile()} function.
|
|---|
| 66 |
|
|---|
| 67 | The \var{filename} will be used for run-time error messages.
|
|---|
| 68 |
|
|---|
| 69 | The \var{mode} must be 'exec' to compile a module, 'single' to compile a
|
|---|
| 70 | single (interactive) statement, or 'eval' to compile an expression.
|
|---|
| 71 |
|
|---|
| 72 | The \var{flags} and \var{dont_inherit} arguments affect future-related
|
|---|
| 73 | statements, but are not supported yet.
|
|---|
| 74 | \end{funcdesc}
|
|---|
| 75 |
|
|---|
| 76 | \begin{funcdesc}{compileFile}{source}
|
|---|
| 77 | Compiles the file \var{source} and generates a .pyc file.
|
|---|
| 78 | \end{funcdesc}
|
|---|
| 79 |
|
|---|
| 80 | The \module{compiler} package contains the following modules:
|
|---|
| 81 | \refmodule[compiler.ast]{ast}, \module{consts}, \module{future},
|
|---|
| 82 | \module{misc}, \module{pyassem}, \module{pycodegen}, \module{symbols},
|
|---|
| 83 | \module{transformer}, and \refmodule[compiler.visitor]{visitor}.
|
|---|
| 84 |
|
|---|
| 85 | \section{Limitations}
|
|---|
| 86 |
|
|---|
| 87 | There are some problems with the error checking of the compiler
|
|---|
| 88 | package. The interpreter detects syntax errors in two distinct
|
|---|
| 89 | phases. One set of errors is detected by the interpreter's parser,
|
|---|
| 90 | the other set by the compiler. The compiler package relies on the
|
|---|
| 91 | interpreter's parser, so it get the first phases of error checking for
|
|---|
| 92 | free. It implements the second phase itself, and that implementation is
|
|---|
| 93 | incomplete. For example, the compiler package does not raise an error
|
|---|
| 94 | if a name appears more than once in an argument list:
|
|---|
| 95 | \code{def f(x, x): ...}
|
|---|
| 96 |
|
|---|
| 97 | A future version of the compiler should fix these problems.
|
|---|
| 98 |
|
|---|
| 99 | \section{Python Abstract Syntax}
|
|---|
| 100 |
|
|---|
| 101 | The \module{compiler.ast} module defines an abstract syntax for
|
|---|
| 102 | Python. In the abstract syntax tree, each node represents a syntactic
|
|---|
| 103 | construct. The root of the tree is \class{Module} object.
|
|---|
| 104 |
|
|---|
| 105 | The abstract syntax offers a higher level interface to parsed Python
|
|---|
| 106 | source code. The \ulink{\module{parser}}
|
|---|
| 107 | {http://www.python.org/doc/current/lib/module-parser.html}
|
|---|
| 108 | module and the compiler written in C for the Python interpreter use a
|
|---|
| 109 | concrete syntax tree. The concrete syntax is tied closely to the
|
|---|
| 110 | grammar description used for the Python parser. Instead of a single
|
|---|
| 111 | node for a construct, there are often several levels of nested nodes
|
|---|
| 112 | that are introduced by Python's precedence rules.
|
|---|
| 113 |
|
|---|
| 114 | The abstract syntax tree is created by the
|
|---|
| 115 | \module{compiler.transformer} module. The transformer relies on the
|
|---|
| 116 | builtin Python parser to generate a concrete syntax tree. It
|
|---|
| 117 | generates an abstract syntax tree from the concrete tree.
|
|---|
| 118 |
|
|---|
| 119 | The \module{transformer} module was created by Greg
|
|---|
| 120 | Stein\index{Stein, Greg} and Bill Tutt\index{Tutt, Bill} for an
|
|---|
| 121 | experimental Python-to-C compiler. The current version contains a
|
|---|
| 122 | number of modifications and improvements, but the basic form of the
|
|---|
| 123 | abstract syntax and of the transformer are due to Stein and Tutt.
|
|---|
| 124 |
|
|---|
| 125 | \subsection{AST Nodes}
|
|---|
| 126 |
|
|---|
| 127 | \declaremodule{}{compiler.ast}
|
|---|
| 128 |
|
|---|
| 129 | The \module{compiler.ast} module is generated from a text file that
|
|---|
| 130 | describes each node type and its elements. Each node type is
|
|---|
| 131 | represented as a class that inherits from the abstract base class
|
|---|
| 132 | \class{compiler.ast.Node} and defines a set of named attributes for
|
|---|
| 133 | child nodes.
|
|---|
| 134 |
|
|---|
| 135 | \begin{classdesc}{Node}{}
|
|---|
| 136 |
|
|---|
| 137 | The \class{Node} instances are created automatically by the parser
|
|---|
| 138 | generator. The recommended interface for specific \class{Node}
|
|---|
| 139 | instances is to use the public attributes to access child nodes. A
|
|---|
| 140 | public attribute may be bound to a single node or to a sequence of
|
|---|
| 141 | nodes, depending on the \class{Node} type. For example, the
|
|---|
| 142 | \member{bases} attribute of the \class{Class} node, is bound to a
|
|---|
| 143 | list of base class nodes, and the \member{doc} attribute is bound to
|
|---|
| 144 | a single node.
|
|---|
| 145 |
|
|---|
| 146 | Each \class{Node} instance has a \member{lineno} attribute which may
|
|---|
| 147 | be \code{None}. XXX Not sure what the rules are for which nodes
|
|---|
| 148 | will have a useful lineno.
|
|---|
| 149 | \end{classdesc}
|
|---|
| 150 |
|
|---|
| 151 | All \class{Node} objects offer the following methods:
|
|---|
| 152 |
|
|---|
| 153 | \begin{methoddesc}{getChildren}{}
|
|---|
| 154 | Returns a flattened list of the child nodes and objects in the
|
|---|
| 155 | order they occur. Specifically, the order of the nodes is the
|
|---|
| 156 | order in which they appear in the Python grammar. Not all of the
|
|---|
| 157 | children are \class{Node} instances. The names of functions and
|
|---|
| 158 | classes, for example, are plain strings.
|
|---|
| 159 | \end{methoddesc}
|
|---|
| 160 |
|
|---|
| 161 | \begin{methoddesc}{getChildNodes}{}
|
|---|
| 162 | Returns a flattened list of the child nodes in the order they
|
|---|
| 163 | occur. This method is like \method{getChildren()}, except that it
|
|---|
| 164 | only returns those children that are \class{Node} instances.
|
|---|
| 165 | \end{methoddesc}
|
|---|
| 166 |
|
|---|
| 167 | Two examples illustrate the general structure of \class{Node}
|
|---|
| 168 | classes. The \keyword{while} statement is defined by the following
|
|---|
| 169 | grammar production:
|
|---|
| 170 |
|
|---|
| 171 | \begin{verbatim}
|
|---|
| 172 | while_stmt: "while" expression ":" suite
|
|---|
| 173 | ["else" ":" suite]
|
|---|
| 174 | \end{verbatim}
|
|---|
| 175 |
|
|---|
| 176 | The \class{While} node has three attributes: \member{test},
|
|---|
| 177 | \member{body}, and \member{else_}. (If the natural name for an
|
|---|
| 178 | attribute is also a Python reserved word, it can't be used as an
|
|---|
| 179 | attribute name. An underscore is appended to the word to make it a
|
|---|
| 180 | legal identifier, hence \member{else_} instead of \keyword{else}.)
|
|---|
| 181 |
|
|---|
| 182 | The \keyword{if} statement is more complicated because it can include
|
|---|
| 183 | several tests.
|
|---|
| 184 |
|
|---|
| 185 | \begin{verbatim}
|
|---|
| 186 | if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
|
|---|
| 187 | \end{verbatim}
|
|---|
| 188 |
|
|---|
| 189 | The \class{If} node only defines two attributes: \member{tests} and
|
|---|
| 190 | \member{else_}. The \member{tests} attribute is a sequence of test
|
|---|
| 191 | expression, consequent body pairs. There is one pair for each
|
|---|
| 192 | \keyword{if}/\keyword{elif} clause. The first element of the pair is
|
|---|
| 193 | the test expression. The second elements is a \class{Stmt} node that
|
|---|
| 194 | contains the code to execute if the test is true.
|
|---|
| 195 |
|
|---|
| 196 | The \method{getChildren()} method of \class{If} returns a flat list of
|
|---|
| 197 | child nodes. If there are three \keyword{if}/\keyword{elif} clauses
|
|---|
| 198 | and no \keyword{else} clause, then \method{getChildren()} will return
|
|---|
| 199 | a list of six elements: the first test expression, the first
|
|---|
| 200 | \class{Stmt}, the second text expression, etc.
|
|---|
| 201 |
|
|---|
| 202 | The following table lists each of the \class{Node} subclasses defined
|
|---|
| 203 | in \module{compiler.ast} and each of the public attributes available
|
|---|
| 204 | on their instances. The values of most of the attributes are
|
|---|
| 205 | themselves \class{Node} instances or sequences of instances. When the
|
|---|
| 206 | value is something other than an instance, the type is noted in the
|
|---|
| 207 | comment. The attributes are listed in the order in which they are
|
|---|
| 208 | returned by \method{getChildren()} and \method{getChildNodes()}.
|
|---|
| 209 |
|
|---|
| 210 | \input{asttable}
|
|---|
| 211 |
|
|---|
| 212 |
|
|---|
| 213 | \subsection{Assignment nodes}
|
|---|
| 214 |
|
|---|
| 215 | There is a collection of nodes used to represent assignments. Each
|
|---|
| 216 | assignment statement in the source code becomes a single
|
|---|
| 217 | \class{Assign} node in the AST. The \member{nodes} attribute is a
|
|---|
| 218 | list that contains a node for each assignment target. This is
|
|---|
| 219 | necessary because assignment can be chained, e.g. \code{a = b = 2}.
|
|---|
| 220 | Each \class{Node} in the list will be one of the following classes:
|
|---|
| 221 | \class{AssAttr}, \class{AssList}, \class{AssName}, or
|
|---|
| 222 | \class{AssTuple}.
|
|---|
| 223 |
|
|---|
| 224 | Each target assignment node will describe the kind of object being
|
|---|
| 225 | assigned to: \class{AssName} for a simple name, e.g. \code{a = 1}.
|
|---|
| 226 | \class{AssAttr} for an attribute assigned, e.g. \code{a.x = 1}.
|
|---|
| 227 | \class{AssList} and \class{AssTuple} for list and tuple expansion
|
|---|
| 228 | respectively, e.g. \code{a, b, c = a_tuple}.
|
|---|
| 229 |
|
|---|
| 230 | The target assignment nodes also have a \member{flags} attribute that
|
|---|
| 231 | indicates whether the node is being used for assignment or in a delete
|
|---|
| 232 | statement. The \class{AssName} is also used to represent a delete
|
|---|
| 233 | statement, e.g. \class{del x}.
|
|---|
| 234 |
|
|---|
| 235 | When an expression contains several attribute references, an
|
|---|
| 236 | assignment or delete statement will contain only one \class{AssAttr}
|
|---|
| 237 | node -- for the final attribute reference. The other attribute
|
|---|
| 238 | references will be represented as \class{Getattr} nodes in the
|
|---|
| 239 | \member{expr} attribute of the \class{AssAttr} instance.
|
|---|
| 240 |
|
|---|
| 241 | \subsection{Examples}
|
|---|
| 242 |
|
|---|
| 243 | This section shows several simple examples of ASTs for Python source
|
|---|
| 244 | code. The examples demonstrate how to use the \function{parse()}
|
|---|
| 245 | function, what the repr of an AST looks like, and how to access
|
|---|
| 246 | attributes of an AST node.
|
|---|
| 247 |
|
|---|
| 248 | The first module defines a single function. Assume it is stored in
|
|---|
| 249 | \file{/tmp/doublelib.py}.
|
|---|
| 250 |
|
|---|
| 251 | \begin{verbatim}
|
|---|
| 252 | """This is an example module.
|
|---|
| 253 |
|
|---|
| 254 | This is the docstring.
|
|---|
| 255 | """
|
|---|
| 256 |
|
|---|
| 257 | def double(x):
|
|---|
| 258 | "Return twice the argument"
|
|---|
| 259 | return x * 2
|
|---|
| 260 | \end{verbatim}
|
|---|
| 261 |
|
|---|
| 262 | In the interactive interpreter session below, I have reformatted the
|
|---|
| 263 | long AST reprs for readability. The AST reprs use unqualified class
|
|---|
| 264 | names. If you want to create an instance from a repr, you must import
|
|---|
| 265 | the class names from the \module{compiler.ast} module.
|
|---|
| 266 |
|
|---|
| 267 | \begin{verbatim}
|
|---|
| 268 | >>> import compiler
|
|---|
| 269 | >>> mod = compiler.parseFile("/tmp/doublelib.py")
|
|---|
| 270 | >>> mod
|
|---|
| 271 | Module('This is an example module.\n\nThis is the docstring.\n',
|
|---|
| 272 | Stmt([Function(None, 'double', ['x'], [], 0,
|
|---|
| 273 | 'Return twice the argument',
|
|---|
| 274 | Stmt([Return(Mul((Name('x'), Const(2))))]))]))
|
|---|
| 275 | >>> from compiler.ast import *
|
|---|
| 276 | >>> Module('This is an example module.\n\nThis is the docstring.\n',
|
|---|
| 277 | ... Stmt([Function(None, 'double', ['x'], [], 0,
|
|---|
| 278 | ... 'Return twice the argument',
|
|---|
| 279 | ... Stmt([Return(Mul((Name('x'), Const(2))))]))]))
|
|---|
| 280 | Module('This is an example module.\n\nThis is the docstring.\n',
|
|---|
| 281 | Stmt([Function(None, 'double', ['x'], [], 0,
|
|---|
| 282 | 'Return twice the argument',
|
|---|
| 283 | Stmt([Return(Mul((Name('x'), Const(2))))]))]))
|
|---|
| 284 | >>> mod.doc
|
|---|
| 285 | 'This is an example module.\n\nThis is the docstring.\n'
|
|---|
| 286 | >>> for node in mod.node.nodes:
|
|---|
| 287 | ... print node
|
|---|
| 288 | ...
|
|---|
| 289 | Function(None, 'double', ['x'], [], 0, 'Return twice the argument',
|
|---|
| 290 | Stmt([Return(Mul((Name('x'), Const(2))))]))
|
|---|
| 291 | >>> func = mod.node.nodes[0]
|
|---|
| 292 | >>> func.code
|
|---|
| 293 | Stmt([Return(Mul((Name('x'), Const(2))))])
|
|---|
| 294 | \end{verbatim}
|
|---|
| 295 |
|
|---|
| 296 | \section{Using Visitors to Walk ASTs}
|
|---|
| 297 |
|
|---|
| 298 | \declaremodule{}{compiler.visitor}
|
|---|
|
|---|