source: trunk/essentials/dev-lang/python/Doc/lib/libpickle.tex@ 3368

Last change on this file since 3368 was 3225, checked in by bird, 19 years ago

Python 2.5

File size: 35.9 KB
Line 
1\section{\module{pickle} --- Python object serialization}
2
3\declaremodule{standard}{pickle}
4\modulesynopsis{Convert Python objects to streams of bytes and back.}
5% Substantial improvements by Jim Kerr <[email protected]>.
6% Rewritten by Barry Warsaw <[email protected]>
7
8\index{persistence}
9\indexii{persistent}{objects}
10\indexii{serializing}{objects}
11\indexii{marshalling}{objects}
12\indexii{flattening}{objects}
13\indexii{pickling}{objects}
14
15The \module{pickle} module implements a fundamental, but powerful
16algorithm for serializing and de-serializing a Python object
17structure. ``Pickling'' is the process whereby a Python object
18hierarchy is converted into a byte stream, and ``unpickling'' is the
19inverse operation, whereby a byte stream is converted back into an
20object hierarchy. Pickling (and unpickling) is alternatively known as
21``serialization'', ``marshalling,''\footnote{Don't confuse this with
22the \refmodule{marshal} module} or ``flattening'',
23however, to avoid confusion, the terms used here are ``pickling'' and
24``unpickling''.
25
26This documentation describes both the \module{pickle} module and the
27\refmodule{cPickle} module.
28
29\subsection{Relationship to other Python modules}
30
31The \module{pickle} module has an optimized cousin called the
32\module{cPickle} module. As its name implies, \module{cPickle} is
33written in C, so it can be up to 1000 times faster than
34\module{pickle}. However it does not support subclassing of the
35\function{Pickler()} and \function{Unpickler()} classes, because in
36\module{cPickle} these are functions, not classes. Most applications
37have no need for this functionality, and can benefit from the improved
38performance of \module{cPickle}. Other than that, the interfaces of
39the two modules are nearly identical; the common interface is
40described in this manual and differences are pointed out where
41necessary. In the following discussions, we use the term ``pickle''
42to collectively describe the \module{pickle} and
43\module{cPickle} modules.
44
45The data streams the two modules produce are guaranteed to be
46interchangeable.
47
48Python has a more primitive serialization module called
49\refmodule{marshal}, but in general
50\module{pickle} should always be the preferred way to serialize Python
51objects. \module{marshal} exists primarily to support Python's
52\file{.pyc} files.
53
54The \module{pickle} module differs from \refmodule{marshal} several
55significant ways:
56
57\begin{itemize}
58
59\item The \module{pickle} module keeps track of the objects it has
60 already serialized, so that later references to the same object
61 won't be serialized again. \module{marshal} doesn't do this.
62
63 This has implications both for recursive objects and object
64 sharing. Recursive objects are objects that contain references
65 to themselves. These are not handled by marshal, and in fact,
66 attempting to marshal recursive objects will crash your Python
67 interpreter. Object sharing happens when there are multiple
68 references to the same object in different places in the object
69 hierarchy being serialized. \module{pickle} stores such objects
70 only once, and ensures that all other references point to the
71 master copy. Shared objects remain shared, which can be very
72 important for mutable objects.
73
74\item \module{marshal} cannot be used to serialize user-defined
75 classes and their instances. \module{pickle} can save and
76 restore class instances transparently, however the class
77 definition must be importable and live in the same module as
78 when the object was stored.
79
80\item The \module{marshal} serialization format is not guaranteed to
81 be portable across Python versions. Because its primary job in
82 life is to support \file{.pyc} files, the Python implementers
83 reserve the right to change the serialization format in
84 non-backwards compatible ways should the need arise. The
85 \module{pickle} serialization format is guaranteed to be
86 backwards compatible across Python releases.
87
88\end{itemize}
89
90\begin{notice}[warning]
91The \module{pickle} module is not intended to be secure against
92erroneous or maliciously constructed data. Never unpickle data
93received from an untrusted or unauthenticated source.
94\end{notice}
95
96Note that serialization is a more primitive notion than persistence;
97although
98\module{pickle} reads and writes file objects, it does not handle the
99issue of naming persistent objects, nor the (even more complicated)
100issue of concurrent access to persistent objects. The \module{pickle}
101module can transform a complex object into a byte stream and it can
102transform the byte stream into an object with the same internal
103structure. Perhaps the most obvious thing to do with these byte
104streams is to write them onto a file, but it is also conceivable to
105send them across a network or store them in a database. The module
106\refmodule{shelve} provides a simple interface
107to pickle and unpickle objects on DBM-style database files.
108
109\subsection{Data stream format}
110
111The data format used by \module{pickle} is Python-specific. This has
112the advantage that there are no restrictions imposed by external
113standards such as XDR\index{XDR}\index{External Data Representation}
114(which can't represent pointer sharing); however it means that
115non-Python programs may not be able to reconstruct pickled Python
116objects.
117
118By default, the \module{pickle} data format uses a printable \ASCII{}
119representation. This is slightly more voluminous than a binary
120representation. The big advantage of using printable \ASCII{} (and of
121some other characteristics of \module{pickle}'s representation) is that
122for debugging or recovery purposes it is possible for a human to read
123the pickled file with a standard text editor.
124
125There are currently 3 different protocols which can be used for pickling.
126
127\begin{itemize}
128
129\item Protocol version 0 is the original ASCII protocol and is backwards
130compatible with earlier versions of Python.
131
132\item Protocol version 1 is the old binary format which is also compatible
133with earlier versions of Python.
134
135\item Protocol version 2 was introduced in Python 2.3. It provides
136much more efficient pickling of new-style classes.
137
138\end{itemize}
139
140Refer to PEP 307 for more information.
141
142If a \var{protocol} is not specified, protocol 0 is used.
143If \var{protocol} is specified as a negative value
144or \constant{HIGHEST_PROTOCOL},
145the highest protocol version available will be used.
146
147\versionchanged[Introduced the \var{protocol} parameter]{2.3}
148
149A binary format, which is slightly more efficient, can be chosen by
150specifying a \var{protocol} version >= 1.
151
152\subsection{Usage}
153
154To serialize an object hierarchy, you first create a pickler, then you
155call the pickler's \method{dump()} method. To de-serialize a data
156stream, you first create an unpickler, then you call the unpickler's
157\method{load()} method. The \module{pickle} module provides the
158following constant:
159
160\begin{datadesc}{HIGHEST_PROTOCOL}
161The highest protocol version available. This value can be passed
162as a \var{protocol} value.
163\versionadded{2.3}
164\end{datadesc}
165
166\note{Be sure to always open pickle files created with protocols >= 1 in
167 binary mode. For the old ASCII-based pickle protocol 0 you can use
168 either text mode or binary mode as long as you stay consistent.
169
170 A pickle file written with protocol 0 in binary mode will contain
171 lone linefeeds as line terminators and therefore will look ``funny''
172 when viewed in Notepad or other editors which do not support this
173 format.}
174
175The \module{pickle} module provides the
176following functions to make the pickling process more convenient:
177
178\begin{funcdesc}{dump}{obj, file\optional{, protocol}}
179Write a pickled representation of \var{obj} to the open file object
180\var{file}. This is equivalent to
181\code{Pickler(\var{file}, \var{protocol}).dump(\var{obj})}.
182
183If the \var{protocol} parameter is omitted, protocol 0 is used.
184If \var{protocol} is specified as a negative value
185or \constant{HIGHEST_PROTOCOL},
186the highest protocol version will be used.
187
188\versionchanged[Introduced the \var{protocol} parameter]{2.3}
189
190\var{file} must have a \method{write()} method that accepts a single
191string argument. It can thus be a file object opened for writing, a
192\refmodule{StringIO} object, or any other custom
193object that meets this interface.
194\end{funcdesc}
195
196\begin{funcdesc}{load}{file}
197Read a string from the open file object \var{file} and interpret it as
198a pickle data stream, reconstructing and returning the original object
199hierarchy. This is equivalent to \code{Unpickler(\var{file}).load()}.
200
201\var{file} must have two methods, a \method{read()} method that takes
202an integer argument, and a \method{readline()} method that requires no
203arguments. Both methods should return a string. Thus \var{file} can
204be a file object opened for reading, a
205\module{StringIO} object, or any other custom
206object that meets this interface.
207
208This function automatically determines whether the data stream was
209written in binary mode or not.
210\end{funcdesc}
211
212\begin{funcdesc}{dumps}{obj\optional{, protocol}}
213Return the pickled representation of the object as a string, instead
214of writing it to a file.
215
216If the \var{protocol} parameter is omitted, protocol 0 is used.
217If \var{protocol} is specified as a negative value
218or \constant{HIGHEST_PROTOCOL},
219the highest protocol version will be used.
220
221\versionchanged[The \var{protocol} parameter was added]{2.3}
222
223\end{funcdesc}
224
225\begin{funcdesc}{loads}{string}
226Read a pickled object hierarchy from a string. Characters in the
227string past the pickled object's representation are ignored.
228\end{funcdesc}
229
230The \module{pickle} module also defines three exceptions:
231
232\begin{excdesc}{PickleError}
233A common base class for the other exceptions defined below. This
234inherits from \exception{Exception}.
235\end{excdesc}
236
237\begin{excdesc}{PicklingError}
238This exception is raised when an unpicklable object is passed to
239the \method{dump()} method.
240\end{excdesc}
241
242\begin{excdesc}{UnpicklingError}
243This exception is raised when there is a problem unpickling an object.
244Note that other exceptions may also be raised during unpickling,
245including (but not necessarily limited to) \exception{AttributeError},
246\exception{EOFError}, \exception{ImportError}, and \exception{IndexError}.
247\end{excdesc}
248
249The \module{pickle} module also exports two callables\footnote{In the
250\module{pickle} module these callables are classes, which you could
251subclass to customize the behavior. However, in the \refmodule{cPickle}
252module these callables are factory functions and so cannot be
253subclassed. One common reason to subclass is to control what
254objects can actually be unpickled. See section~\ref{pickle-sub} for
255more details.}, \class{Pickler} and \class{Unpickler}:
256
257\begin{classdesc}{Pickler}{file\optional{, protocol}}
258This takes a file-like object to which it will write a pickle data
259stream.
260
261If the \var{protocol} parameter is omitted, protocol 0 is used.
262If \var{protocol} is specified as a negative value,
263the highest protocol version will be used.
264
265\versionchanged[Introduced the \var{protocol} parameter]{2.3}
266
267\var{file} must have a \method{write()} method that accepts a single
268string argument. It can thus be an open file object, a
269\module{StringIO} object, or any other custom
270object that meets this interface.
271\end{classdesc}
272
273\class{Pickler} objects define one (or two) public methods:
274
275\begin{methoddesc}[Pickler]{dump}{obj}
276Write a pickled representation of \var{obj} to the open file object
277given in the constructor. Either the binary or \ASCII{} format will
278be used, depending on the value of the \var{protocol} argument passed to the
279constructor.
280\end{methoddesc}
281
282\begin{methoddesc}[Pickler]{clear_memo}{}
283Clears the pickler's ``memo''. The memo is the data structure that
284remembers which objects the pickler has already seen, so that shared
285or recursive objects pickled by reference and not by value. This
286method is useful when re-using picklers.
287
288\begin{notice}
289Prior to Python 2.3, \method{clear_memo()} was only available on the
290picklers created by \refmodule{cPickle}. In the \module{pickle} module,
291picklers have an instance variable called \member{memo} which is a
292Python dictionary. So to clear the memo for a \module{pickle} module
293pickler, you could do the following:
294
295\begin{verbatim}
296mypickler.memo.clear()
297\end{verbatim}
298
299Code that does not need to support older versions of Python should
300simply use \method{clear_memo()}.
301\end{notice}
302\end{methoddesc}
303
304It is possible to make multiple calls to the \method{dump()} method of
305the same \class{Pickler} instance. These must then be matched to the
306same number of calls to the \method{load()} method of the
307corresponding \class{Unpickler} instance. If the same object is
308pickled by multiple \method{dump()} calls, the \method{load()} will
309all yield references to the same object.\footnote{\emph{Warning}: this
310is intended for pickling multiple objects without intervening
311modifications to the objects or their parts. If you modify an object
312and then pickle it again using the same \class{Pickler} instance, the
313object is not pickled again --- a reference to it is pickled and the
314\class{Unpickler} will return the old value, not the modified one.
315There are two problems here: (1) detecting changes, and (2)
316marshalling a minimal set of changes. Garbage Collection may also
317become a problem here.}
318
319\class{Unpickler} objects are defined as:
320
321\begin{classdesc}{Unpickler}{file}
322This takes a file-like object from which it will read a pickle data
323stream. This class automatically determines whether the data stream
324was written in binary mode or not, so it does not need a flag as in
325the \class{Pickler} factory.
326
327\var{file} must have two methods, a \method{read()} method that takes
328an integer argument, and a \method{readline()} method that requires no
329arguments. Both methods should return a string. Thus \var{file} can
330be a file object opened for reading, a
331\module{StringIO} object, or any other custom
332object that meets this interface.
333\end{classdesc}
334
335\class{Unpickler} objects have one (or two) public methods:
336
337\begin{methoddesc}[Unpickler]{load}{}
338Read a pickled object representation from the open file object given
339in the constructor, and return the reconstituted object hierarchy
340specified therein.
341\end{methoddesc}
342
343\begin{methoddesc}[Unpickler]{noload}{}
344This is just like \method{load()} except that it doesn't actually
345create any objects. This is useful primarily for finding what's
346called ``persistent ids'' that may be referenced in a pickle data
347stream. See section~\ref{pickle-protocol} below for more details.
348
349\strong{Note:} the \method{noload()} method is currently only
350available on \class{Unpickler} objects created with the
351\module{cPickle} module. \module{pickle} module \class{Unpickler}s do
352not have the \method{noload()} method.
353\end{methoddesc}
354
355\subsection{What can be pickled and unpickled?}
356
357The following types can be pickled:
358
359\begin{itemize}
360
361\item \code{None}, \code{True}, and \code{False}
362
363\item integers, long integers, floating point numbers, complex numbers
364
365\item normal and Unicode strings
366
367\item tuples, lists, sets, and dictionaries containing only picklable objects
368
369\item functions defined at the top level of a module
370
371\item built-in functions defined at the top level of a module
372
373\item classes that are defined at the top level of a module
374
375\item instances of such classes whose \member{__dict__} or
376\method{__setstate__()} is picklable (see
377section~\ref{pickle-protocol} for details)
378
379\end{itemize}
380
381Attempts to pickle unpicklable objects will raise the
382\exception{PicklingError} exception; when this happens, an unspecified
383number of bytes may have already been written to the underlying file.
384Trying to pickle a highly recursive data structure may exceed the
385maximum recursion depth, a \exception{RuntimeError} will be raised
386in this case. You can carefully raise this limit with
387\function{sys.setrecursionlimit()}.
388
389Note that functions (built-in and user-defined) are pickled by ``fully
390qualified'' name reference, not by value. This means that only the
391function name is pickled, along with the name of module the function
392is defined in. Neither the function's code, nor any of its function
393attributes are pickled. Thus the defining module must be importable
394in the unpickling environment, and the module must contain the named
395object, otherwise an exception will be raised.\footnote{The exception
396raised will likely be an \exception{ImportError} or an
397\exception{AttributeError} but it could be something else.}
398
399Similarly, classes are pickled by named reference, so the same
400restrictions in the unpickling environment apply. Note that none of
401the class's code or data is pickled, so in the following example the
402class attribute \code{attr} is not restored in the unpickling
403environment:
404
405\begin{verbatim}
406class Foo:
407 attr = 'a class attr'
408
409picklestring = pickle.dumps(Foo)
410\end{verbatim}
411
412These restrictions are why picklable functions and classes must be
413defined in the top level of a module.
414
415Similarly, when class instances are pickled, their class's code and
416data are not pickled along with them. Only the instance data are
417pickled. This is done on purpose, so you can fix bugs in a class or
418add methods to the class and still load objects that were created with
419an earlier version of the class. If you plan to have long-lived
420objects that will see many versions of a class, it may be worthwhile
421to put a version number in the objects so that suitable conversions
422can be made by the class's \method{__setstate__()} method.
423
424\subsection{The pickle protocol
425\label{pickle-protocol}}\setindexsubitem{(pickle protocol)}
426
427This section describes the ``pickling protocol'' that defines the
428interface between the pickler/unpickler and the objects that are being
429serialized. This protocol provides a standard way for you to define,
430customize, and control how your objects are serialized and