| 1 | \documentclass{howto}
|
|---|
| 2 |
|
|---|
| 3 | \title{Idioms and Anti-Idioms in Python}
|
|---|
| 4 |
|
|---|
| 5 | \release{0.00}
|
|---|
| 6 |
|
|---|
| 7 | \author{Moshe Zadka}
|
|---|
| 8 | \authoraddress{[email protected]}
|
|---|
| 9 |
|
|---|
| 10 | \begin{document}
|
|---|
| 11 | \maketitle
|
|---|
| 12 |
|
|---|
| 13 | This document is placed in the public doman.
|
|---|
| 14 |
|
|---|
| 15 | \begin{abstract}
|
|---|
| 16 | \noindent
|
|---|
| 17 | This document can be considered a companion to the tutorial. It
|
|---|
| 18 | shows how to use Python, and even more importantly, how {\em not}
|
|---|
| 19 | to use Python.
|
|---|
| 20 | \end{abstract}
|
|---|
| 21 |
|
|---|
| 22 | \tableofcontents
|
|---|
| 23 |
|
|---|
| 24 | \section{Language Constructs You Should Not Use}
|
|---|
| 25 |
|
|---|
| 26 | While Python has relatively few gotchas compared to other languages, it
|
|---|
| 27 | still has some constructs which are only useful in corner cases, or are
|
|---|
| 28 | plain dangerous.
|
|---|
| 29 |
|
|---|
| 30 | \subsection{from module import *}
|
|---|
| 31 |
|
|---|
| 32 | \subsubsection{Inside Function Definitions}
|
|---|
| 33 |
|
|---|
| 34 | \code{from module import *} is {\em invalid} inside function definitions.
|
|---|
| 35 | While many versions of Python do no check for the invalidity, it does not
|
|---|
| 36 | make it more valid, no more then having a smart lawyer makes a man innocent.
|
|---|
| 37 | Do not use it like that ever. Even in versions where it was accepted, it made
|
|---|
| 38 | the function execution slower, because the compiler could not be certain
|
|---|
| 39 | which names are local and which are global. In Python 2.1 this construct
|
|---|
| 40 | causes warnings, and sometimes even errors.
|
|---|
| 41 |
|
|---|
| 42 | \subsubsection{At Module Level}
|
|---|
| 43 |
|
|---|
| 44 | While it is valid to use \code{from module import *} at module level it
|
|---|
| 45 | is usually a bad idea. For one, this loses an important property Python
|
|---|
| 46 | otherwise has --- you can know where each toplevel name is defined by
|
|---|
| 47 | a simple "search" function in your favourite editor. You also open yourself
|
|---|
| 48 | to trouble in the future, if some module grows additional functions or
|
|---|
| 49 | classes.
|
|---|
| 50 |
|
|---|
| 51 | One of the most awful question asked on the newsgroup is why this code:
|
|---|
| 52 |
|
|---|
| 53 | \begin{verbatim}
|
|---|
| 54 | f = open("www")
|
|---|
| 55 | f.read()
|
|---|
| 56 | \end{verbatim}
|
|---|
| 57 |
|
|---|
| 58 | does not work. Of course, it works just fine (assuming you have a file
|
|---|
| 59 | called "www".) But it does not work if somewhere in the module, the
|
|---|
| 60 | statement \code{from os import *} is present. The \module{os} module
|
|---|
| 61 | has a function called \function{open()} which returns an integer. While
|
|---|
| 62 | it is very useful, shadowing builtins is one of its least useful properties.
|
|---|
| 63 |
|
|---|
| 64 | Remember, you can never know for sure what names a module exports, so either
|
|---|
| 65 | take what you need --- \code{from module import name1, name2}, or keep them in
|
|---|
| 66 | the module and access on a per-need basis ---
|
|---|
| 67 | \code{import module;print module.name}.
|
|---|
| 68 |
|
|---|
| 69 | \subsubsection{When It Is Just Fine}
|
|---|
| 70 |
|
|---|
| 71 | There are situations in which \code{from module import *} is just fine:
|
|---|
| 72 |
|
|---|
| 73 | \begin{itemize}
|
|---|
| 74 |
|
|---|
| 75 | \item The interactive prompt. For example, \code{from math import *} makes
|
|---|
| 76 | Python an amazing scientific calculator.
|
|---|
| 77 |
|
|---|
| 78 | \item When extending a module in C with a module in Python.
|
|---|
| 79 |
|
|---|
| 80 | \item When the module advertises itself as \code{from import *} safe.
|
|---|
| 81 |
|
|---|
| 82 | \end{itemize}
|
|---|
| 83 |
|
|---|
| 84 | \subsection{Unadorned \keyword{exec}, \function{execfile} and friends}
|
|---|
| 85 |
|
|---|
| 86 | The word ``unadorned'' refers to the use without an explicit dictionary,
|
|---|
| 87 | in which case those constructs evaluate code in the {\em current} environment.
|
|---|
| 88 | This is dangerous for the same reasons \code{from import *} is dangerous ---
|
|---|
| 89 | it might step over variables you are counting on and mess up things for
|
|---|
| 90 | the rest of your code. Simply do not do that.
|
|---|
| 91 |
|
|---|
| 92 | Bad examples:
|
|---|
| 93 |
|
|---|
| 94 | \begin{verbatim}
|
|---|
| 95 | >>> for name in sys.argv[1:]:
|
|---|
| 96 | >>> exec "%s=1" % name
|
|---|
| 97 | >>> def func(s, **kw):
|
|---|
| 98 | >>> for var, val in kw.items():
|
|---|
| 99 | >>> exec "s.%s=val" % var # invalid!
|
|---|
| 100 | >>> execfile("handler.py")
|
|---|
| 101 | >>> handle()
|
|---|
| 102 | \end{verbatim}
|
|---|
| 103 |
|
|---|
| 104 | Good examples:
|
|---|
| 105 |
|
|---|
| 106 | \begin{verbatim}
|
|---|
| 107 | >>> d = {}
|
|---|
| 108 | >>> for name in sys.argv[1:]:
|
|---|
| 109 | >>> d[name] = 1
|
|---|
| 110 | >>> def func(s, **kw):
|
|---|
| 111 | >>> for var, val in kw.items():
|
|---|
| 112 | >>> setattr(s, var, val)
|
|---|
| 113 | >>> d={}
|
|---|
| 114 | >>> execfile("handle.py", d, d)
|
|---|
| 115 | >>> handle = d['handle']
|
|---|
| 116 | >>> handle()
|
|---|
| 117 | \end{verbatim}
|
|---|
| 118 |
|
|---|
| 119 | \subsection{from module import name1, name2}
|
|---|
| 120 |
|
|---|
| 121 | This is a ``don't'' which is much weaker then the previous ``don't''s
|
|---|
| 122 | but is still something you should not do if you don't have good reasons
|
|---|
| 123 | to do that. The reason it is usually bad idea is because you suddenly
|
|---|
| 124 | have an object which lives in two seperate namespaces. When the binding
|
|---|
| 125 | in one namespace changes, the binding in the other will not, so there
|
|---|
| 126 | will be a discrepancy between them. This happens when, for example,
|
|---|
| 127 | one module is reloaded, or changes the definition of a function at runtime.
|
|---|
| 128 |
|
|---|
| 129 | Bad example:
|
|---|
| 130 |
|
|---|
| 131 | \begin{verbatim}
|
|---|
| 132 | # foo.py
|
|---|
| 133 | a = 1
|
|---|
| 134 |
|
|---|
| 135 | # bar.py
|
|---|
| 136 | from foo import a
|
|---|
| 137 | if something():
|
|---|
| 138 | a = 2 # danger: foo.a != a
|
|---|
| 139 | \end{verbatim}
|
|---|
| 140 |
|
|---|
| 141 | Good example:
|
|---|
| 142 |
|
|---|
| 143 | \begin{verbatim}
|
|---|
| 144 | # foo.py
|
|---|
| 145 | a = 1
|
|---|
| 146 |
|
|---|
| 147 | # bar.py
|
|---|
| 148 | import foo
|
|---|
| 149 | if something():
|
|---|
| 150 | foo.a = 2
|
|---|
| 151 | \end{verbatim}
|
|---|
| 152 |
|
|---|
| 153 | \subsection{except:}
|
|---|
| 154 |
|
|---|
| 155 | Python has the \code{except:} clause, which catches all exceptions.
|
|---|
| 156 | Since {\em every} error in Python raises an exception, this makes many
|
|---|
| 157 | programming errors look like runtime problems, and hinders
|
|---|
| 158 | the debugging process.
|
|---|
| 159 |
|
|---|
| 160 | The following code shows a great example:
|
|---|
| 161 |
|
|---|
| 162 | \begin{verbatim}
|
|---|
| 163 | try:
|
|---|
| 164 | foo = opne("file") # misspelled "open"
|
|---|
| 165 | except:
|
|---|
| 166 | sys.exit("could not open file!")
|
|---|
| 167 | \end{verbatim}
|
|---|
| 168 |
|
|---|
| 169 | The second line triggers a \exception{NameError} which is caught by the
|
|---|
| 170 | except clause. The program will exit, and you will have no idea that
|
|---|
| 171 | this has nothing to do with the readability of \code{"file"}.
|
|---|
| 172 |
|
|---|
| 173 | The example above is better written
|
|---|
| 174 |
|
|---|
| 175 | \begin{verbatim}
|
|---|
| 176 | try:
|
|---|
| 177 | foo = opne("file") # will be changed to "open" as soon as we run it
|
|---|
| 178 | except IOError:
|
|---|
| 179 | sys.exit("could not open file")
|
|---|
| 180 | \end{verbatim}
|
|---|
| 181 |
|
|---|
| 182 | There are some situations in which the \code{except:} clause is useful:
|
|---|
| 183 | for example, in a framework when running callbacks, it is good not to
|
|---|
| 184 | let any callback disturb the framework.
|
|---|
| 185 |
|
|---|
| 186 | \section{Exceptions}
|
|---|
| 187 |
|
|---|
| 188 | Exceptions are a useful feature of Python. You should learn to raise
|
|---|
| 189 | them whenever something unexpected occurs, and catch them only where
|
|---|
| 190 | you can do something about them.
|
|---|
| 191 |
|
|---|
| 192 | The following is a very popular anti-idiom
|
|---|
| 193 |
|
|---|
| 194 | \begin{verbatim}
|
|---|
| 195 | def get_status(file):
|
|---|
| 196 | if not os.path.exists(file):
|
|---|
| 197 | print "file not found"
|
|---|
| 198 | sys.exit(1)
|
|---|
| 199 | return open(file).readline()
|
|---|
| 200 | \end{verbatim}
|
|---|
| 201 |
|
|---|
| 202 | Consider the case the file gets deleted between the time the call to
|
|---|
| 203 | \function{os.path.exists} is made and the time \function{open} is called.
|
|---|
| 204 | That means the last line will throw an \exception{IOError}. The same would
|
|---|
| 205 | happen if \var{file} exists but has no read permission. Since testing this
|
|---|
| 206 | on a normal machine on existing and non-existing files make it seem bugless,
|
|---|
| 207 | that means in testing the results will seem fine, and the code will get
|
|---|
| 208 | shipped. Then an unhandled \exception{IOError} escapes to the user, who
|
|---|
| 209 | has to watch the ugly traceback.
|
|---|
| 210 |
|
|---|
| 211 | Here is a better way to do it.
|
|---|
| 212 |
|
|---|
| 213 | \begin{verbatim}
|
|---|
| 214 | def get_status(file):
|
|---|
| 215 | try:
|
|---|
| 216 | return open(file).readline()
|
|---|
| 217 | except (IOError, OSError):
|
|---|
| 218 | print "file not found"
|
|---|
| 219 | sys.exit(1)
|
|---|
| 220 | \end{verbatim}
|
|---|
| 221 |
|
|---|
| 222 | In this version, *either* the file gets opened and the line is read
|
|---|
| 223 | (so it works even on flaky NFS or SMB connections), or the message
|
|---|
| 224 | is printed and the application aborted.
|
|---|
| 225 |
|
|---|
| 226 | Still, \function{get_status} makes too many assumptions --- that it
|
|---|
| 227 | will only be used in a short running script, and not, say, in a long
|
|---|
| 228 | running server. Sure, the caller could do something like
|
|---|
| 229 |
|
|---|
| 230 | \begin{verbatim}
|
|---|
| 231 | try:
|
|---|
| 232 | status = get_status(log)
|
|---|
| 233 | except SystemExit:
|
|---|
| 234 | status = None
|
|---|
| 235 | \end{verbatim}
|
|---|
| 236 |
|
|---|
| 237 | So, try to make as few \code{except} clauses in your code --- those will
|
|---|
| 238 | usually be a catch-all in the \function{main}, or inside calls which
|
|---|
| 239 | should always succeed.
|
|---|
| 240 |
|
|---|
| 241 | So, the best version is probably
|
|---|
| 242 |
|
|---|
| 243 | \begin{verbatim}
|
|---|
| 244 | def get_status(file):
|
|---|
| 245 | return open(file).readline()
|
|---|
| 246 | \end{verbatim}
|
|---|
| 247 |
|
|---|
| 248 | The caller can deal with the exception if it wants (for example, if it
|
|---|
| 249 | tries several files in a loop), or just let the exception filter upwards
|
|---|
| 250 | to {\em its} caller.
|
|---|
| 251 |
|
|---|
| 252 | The last version is not very good either --- due to implementation details,
|
|---|
| 253 | the file would not be closed when an exception is raised until the handler
|
|---|
| 254 | finishes, and perhaps not at all in non-C implementations (e.g., Jython).
|
|---|
| 255 |
|
|---|
| 256 | \begin{verbatim}
|
|---|
| 257 | def get_status(file):
|
|---|
| 258 | fp = open(file)
|
|---|
| 259 | try:
|
|---|
| 260 | return fp.readline()
|
|---|
| 261 | finally:
|
|---|
| 262 | fp.close()
|
|---|
| 263 | \end{verbatim}
|
|---|
| 264 |
|
|---|
| 265 | \section{Using the Batteries}
|
|---|
| 266 |
|
|---|
| 267 | Every so often, people seem to be writing stuff in the Python library
|
|---|
| 268 | again, usually poorly. While the occasional module has a poor interface,
|
|---|
| 269 | it is usually much better to use the rich standard library and data
|
|---|
| 270 | types that come with Python then inventing your own.
|
|---|
| 271 |
|
|---|
| 272 | A useful module very few people know about is \module{os.path}. It
|
|---|
| 273 | always has the correct path arithmetic for your operating system, and
|
|---|
| 274 | will usually be much better then whatever you come up with yourself.
|
|---|
| 275 |
|
|---|
| 276 | Compare:
|
|---|
| 277 |
|
|---|
| 278 | \begin{verbatim}
|
|---|
| 279 | # ugh!
|
|---|
| 280 | return dir+"/"+file
|
|---|
| 281 | # better
|
|---|
| 282 | return os.path.join(dir, file)
|
|---|
| 283 | \end{verbatim}
|
|---|
| 284 |
|
|---|
| 285 | More useful functions in \module{os.path}: \function{basename},
|
|---|
| 286 | \function{dirname} and \function{splitext}.
|
|---|
| 287 |
|
|---|
| 288 | There are also many useful builtin functions people seem not to be
|
|---|
| 289 | aware of for some reason: \function{min()} and \function{max()} can
|
|---|
| 290 | find the minimum/maximum of any sequence with comparable semantics,
|
|---|
| 291 | for example, yet many people write their own
|
|---|
| 292 | \function{max()}/\function{min()}. Another highly useful function is
|
|---|
| 293 | \function{reduce()}. A classical use of \function{reduce()}
|
|---|
| 294 | is something like
|
|---|
| 295 |
|
|---|
| 296 | \begin{verbatim}
|
|---|
| 297 | import sys, operator
|
|---|
| 298 | nums = map(float, sys.argv[1:])
|
|---|
| 299 | print reduce(operator.add, nums)/len(nums)
|
|---|
| 300 | \end{verbatim}
|
|---|
| 301 |
|
|---|
| 302 | This cute little script prints the average of all numbers given on the
|
|---|
| 303 | command line. The \function{reduce()} adds up all the numbers, and
|
|---|
| 304 | the rest is just some pre- and postprocessing.
|
|---|
| 305 |
|
|---|
| 306 | On the same note, note that \function{float()}, \function{int()} and
|
|---|
| 307 | \function{long()} all accept arguments of type string, and so are
|
|---|
| 308 | suited to parsing --- assuming you are ready to deal with the
|
|---|
| 309 | \exception{ValueError} they raise.
|
|---|
| 310 |
|
|---|
| 311 | \section{Using Backslash to Continue Statements}
|
|---|
| 312 |
|
|---|
| 313 | Since Python treats a newline as a statement terminator,
|
|---|
| 314 | and since statements are often more then is comfortable to put
|
|---|
| 315 | in one line, many people do:
|
|---|
| 316 |
|
|---|
| 317 | \begin{verbatim}
|
|---|
| 318 | if foo.bar()['first'][0] == baz.quux(1, 2)[5:9] and \
|
|---|
| 319 | calculate_number(10, 20) != forbulate(500, 360):
|
|---|
| 320 | pass
|
|---|
| 321 | \end{verbatim}
|
|---|
| 322 |
|
|---|
| 323 | You should realize that this is dangerous: a stray space after the
|
|---|
| 324 | \code{\\} would make this line wrong, and stray spaces are notoriously
|
|---|
| 325 | hard to see in editors. In this case, at least it would be a syntax
|
|---|
| 326 | error, but if the code was:
|
|---|
| 327 |
|
|---|
| 328 | \begin{verbatim}
|
|---|
| 329 | value = foo.bar()['first'][0]*baz.quux(1, 2)[5:9] \
|
|---|
| 330 | + calculate_number(10, 20)*forbulate(500, 360)
|
|---|
| 331 | \end{verbatim}
|
|---|
| 332 |
|
|---|
| 333 | then it would just be subtly wrong.
|
|---|
| 334 |
|
|---|
| 335 | It is usually much better to use the implicit continuation inside parenthesis:
|
|---|
| 336 |
|
|---|
| 337 | This version is bulletproof:
|
|---|
| 338 |
|
|---|
| 339 | \begin{verbatim}
|
|---|
| 340 | value = (foo.bar()['first'][0]*baz.quux(1, 2)[5:9]
|
|---|
| 341 | + calculate_number(10, 20)*forbulate(500, 360))
|
|---|
| 342 | \end{verbatim}
|
|---|
| 343 |
|
|---|
| 344 | \end{document}
|
|---|