| 1 | \section{\module{csv} --- CSV File Reading and Writing}
|
|---|
| 2 |
|
|---|
| 3 | \declaremodule{standard}{csv}
|
|---|
| 4 | \modulesynopsis{Write and read tabular data to and from delimited files.}
|
|---|
| 5 | \sectionauthor{Skip Montanaro}{[email protected]}
|
|---|
| 6 |
|
|---|
| 7 | \versionadded{2.3}
|
|---|
| 8 | \index{csv}
|
|---|
| 9 | \indexii{data}{tabular}
|
|---|
| 10 |
|
|---|
| 11 | The so-called CSV (Comma Separated Values) format is the most common import
|
|---|
| 12 | and export format for spreadsheets and databases. There is no ``CSV
|
|---|
| 13 | standard'', so the format is operationally defined by the many applications
|
|---|
| 14 | which read and write it. The lack of a standard means that subtle
|
|---|
| 15 | differences often exist in the data produced and consumed by different
|
|---|
| 16 | applications. These differences can make it annoying to process CSV files
|
|---|
| 17 | from multiple sources. Still, while the delimiters and quoting characters
|
|---|
| 18 | vary, the overall format is similar enough that it is possible to write a
|
|---|
| 19 | single module which can efficiently manipulate such data, hiding the details
|
|---|
| 20 | of reading and writing the data from the programmer.
|
|---|
| 21 |
|
|---|
| 22 | The \module{csv} module implements classes to read and write tabular data in
|
|---|
| 23 | CSV format. It allows programmers to say, ``write this data in the format
|
|---|
| 24 | preferred by Excel,'' or ``read data from this file which was generated by
|
|---|
| 25 | Excel,'' without knowing the precise details of the CSV format used by
|
|---|
| 26 | Excel. Programmers can also describe the CSV formats understood by other
|
|---|
| 27 | applications or define their own special-purpose CSV formats.
|
|---|
| 28 |
|
|---|
| 29 | The \module{csv} module's \class{reader} and \class{writer} objects read and
|
|---|
| 30 | write sequences. Programmers can also read and write data in dictionary
|
|---|
| 31 | form using the \class{DictReader} and \class{DictWriter} classes.
|
|---|
| 32 |
|
|---|
| 33 | \begin{notice}
|
|---|
| 34 | This version of the \module{csv} module doesn't support Unicode
|
|---|
| 35 | input. Also, there are currently some issues regarding \ASCII{} NUL
|
|---|
| 36 | characters. Accordingly, all input should be UTF-8 or printable
|
|---|
| 37 | \ASCII{} to be safe; see the examples in section~\ref{csv-examples}.
|
|---|
| 38 | These restrictions will be removed in the future.
|
|---|
| 39 | \end{notice}
|
|---|
| 40 |
|
|---|
| 41 | \begin{seealso}
|
|---|
| 42 | % \seemodule{array}{Arrays of uniformly types numeric values.}
|
|---|
| 43 | \seepep{305}{CSV File API}
|
|---|
| 44 | {The Python Enhancement Proposal which proposed this addition
|
|---|
| 45 | to Python.}
|
|---|
| 46 | \end{seealso}
|
|---|
| 47 |
|
|---|
| 48 |
|
|---|
| 49 | \subsection{Module Contents \label{csv-contents}}
|
|---|
| 50 |
|
|---|
| 51 | The \module{csv} module defines the following functions:
|
|---|
| 52 |
|
|---|
| 53 | \begin{funcdesc}{reader}{csvfile\optional{,
|
|---|
| 54 | dialect=\code{'excel'}}\optional{, fmtparam}}
|
|---|
| 55 | Return a reader object which will iterate over lines in the given
|
|---|
| 56 | {}\var{csvfile}. \var{csvfile} can be any object which supports the
|
|---|
| 57 | iterator protocol and returns a string each time its \method{next}
|
|---|
| 58 | method is called --- file objects and list objects are both suitable.
|
|---|
| 59 | If \var{csvfile} is a file object, it must be opened with
|
|---|
| 60 | the 'b' flag on platforms where that makes a difference. An optional
|
|---|
| 61 | {}\var{dialect} parameter can be given
|
|---|
| 62 | which is used to define a set of parameters specific to a particular CSV
|
|---|
| 63 | dialect. It may be an instance of a subclass of the \class{Dialect}
|
|---|
| 64 | class or one of the strings returned by the \function{list_dialects}
|
|---|
| 65 | function. The other optional {}\var{fmtparam} keyword arguments can be
|
|---|
| 66 | given to override individual formatting parameters in the current
|
|---|
| 67 | dialect. For more information about the dialect and formatting
|
|---|
| 68 | parameters, see section~\ref{csv-fmt-params}, ``Dialects and Formatting
|
|---|
| 69 | Parameters'' for details of these parameters.
|
|---|
| 70 |
|
|---|
| 71 | All data read are returned as strings. No automatic data type
|
|---|
| 72 | conversion is performed.
|
|---|
| 73 |
|
|---|
| 74 | \versionchanged[
|
|---|
| 75 | The parser is now stricter with respect to multi-line quoted
|
|---|
| 76 | fields. Previously, if a line ended within a quoted field without a
|
|---|
| 77 | terminating newline character, a newline would be inserted into the
|
|---|
| 78 | returned field. This behavior caused problems when reading files
|
|---|
| 79 | which contained carriage return characters within fields. The
|
|---|
| 80 | behavior was changed to return the field without inserting newlines. As
|
|---|
| 81 | a consequence, if newlines embedded within fields are important, the
|
|---|
| 82 | input should be split into lines in a manner which preserves the newline
|
|---|
| 83 | characters]{2.5}
|
|---|
| 84 |
|
|---|
| 85 | \end{funcdesc}
|
|---|
| 86 |
|
|---|
| 87 | \begin{funcdesc}{writer}{csvfile\optional{,
|
|---|
| 88 | dialect=\code{'excel'}}\optional{, fmtparam}}
|
|---|
| 89 | Return a writer object responsible for converting the user's data into
|
|---|
| 90 | delimited strings on the given file-like object. \var{csvfile} can be any
|
|---|
| 91 | object with a \function{write} method. If \var{csvfile} is a file object,
|
|---|
| 92 | it must be opened with the 'b' flag on platforms where that makes a
|
|---|
| 93 | difference. An optional
|
|---|
| 94 | {}\var{dialect} parameter can be given which is used to define a set of
|
|---|
| 95 | parameters specific to a particular CSV dialect. It may be an instance
|
|---|
| 96 | of a subclass of the \class{Dialect} class or one of the strings
|
|---|
| 97 | returned by the \function{list_dialects} function. The other optional
|
|---|
| 98 | {}\var{fmtparam} keyword arguments can be given to override individual
|
|---|
| 99 | formatting parameters in the current dialect. For more information
|
|---|
| 100 | about the dialect and formatting parameters, see
|
|---|
| 101 | section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters'' for
|
|---|
| 102 | details of these parameters. To make it as easy as possible to
|
|---|
| 103 | interface with modules which implement the DB API, the value
|
|---|
| 104 | \constant{None} is written as the empty string. While this isn't a
|
|---|
| 105 | reversible transformation, it makes it easier to dump SQL NULL data values
|
|---|
| 106 | to CSV files without preprocessing the data returned from a
|
|---|
| 107 | \code{cursor.fetch*()} call. All other non-string data are stringified
|
|---|
| 108 | with \function{str()} before being written.
|
|---|
| 109 | \end{funcdesc}
|
|---|
| 110 |
|
|---|
| 111 | \begin{funcdesc}{register_dialect}{name\optional{, dialect}\optional{, fmtparam}}
|
|---|
| 112 | Associate \var{dialect} with \var{name}. \var{name} must be a string
|
|---|
| 113 | or Unicode object. The dialect can be specified either by passing a
|
|---|
| 114 | sub-class of \class{Dialect}, or by \var{fmtparam} keyword arguments,
|
|---|
| 115 | or both, with keyword arguments overriding parameters of the dialect.
|
|---|
| 116 | For more information about the dialect and formatting parameters, see
|
|---|
| 117 | section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters''
|
|---|
| 118 | for details of these parameters.
|
|---|
| 119 | \end{funcdesc}
|
|---|
| 120 |
|
|---|
| 121 | \begin{funcdesc}{unregister_dialect}{name}
|
|---|
| 122 | Delete the dialect associated with \var{name} from the dialect registry. An
|
|---|
| 123 | \exception{Error} is raised if \var{name} is not a registered dialect
|
|---|
| 124 | name.
|
|---|
| 125 | \end{funcdesc}
|
|---|
| 126 |
|
|---|
| 127 | \begin{funcdesc}{get_dialect}{name}
|
|---|
| 128 | Return the dialect associated with \var{name}. An \exception{Error} is
|
|---|
| 129 | raised if \var{name} is not a registered dialect name.
|
|---|
| 130 | \end{funcdesc}
|
|---|
| 131 |
|
|---|
| 132 | \begin{funcdesc}{list_dialects}{}
|
|---|
| 133 | Return the names of all registered dialects.
|
|---|
| 134 | \end{funcdesc}
|
|---|
| 135 |
|
|---|
| 136 | \begin{funcdesc}{field_size_limit}{\optional{new_limit}}
|
|---|
| 137 | Returns the current maximum field size allowed by the parser. If
|
|---|
| 138 | \var{new_limit} is given, this becomes the new limit.
|
|---|
| 139 | \versionadded{2.5}
|
|---|
| 140 | \end{funcdesc}
|
|---|
| 141 |
|
|---|
| 142 |
|
|---|
| 143 | The \module{csv} module defines the following classes:
|
|---|
| 144 |
|
|---|
| 145 | \begin{classdesc}{DictReader}{csvfile\optional{,
|
|---|
| 146 | fieldnames=\constant{None},\optional{,
|
|---|
| 147 | restkey=\constant{None}\optional{,
|
|---|
| 148 | restval=\constant{None}\optional{,
|
|---|
| 149 | dialect=\code{'excel'}\optional{,
|
|---|
| 150 | *args, **kwds}}}}}}
|
|---|
| 151 | Create an object which operates like a regular reader but maps the
|
|---|
| 152 | information read into a dict whose keys are given by the optional
|
|---|
| 153 | {} \var{fieldnames}
|
|---|
| 154 | parameter. If the \var{fieldnames} parameter is omitted, the values in
|
|---|
| 155 | the first row of the \var{csvfile} will be used as the fieldnames.
|
|---|
| 156 | If the row read has fewer fields than the fieldnames sequence,
|
|---|
| 157 | the value of \var{restval} will be used as the default value. If the row
|
|---|
| 158 | read has more fields than the fieldnames sequence, the remaining data is
|
|---|
| 159 | added as a sequence keyed by the value of \var{restkey}. If the row read
|
|---|
| 160 | has fewer fields than the fieldnames sequence, the remaining keys take the
|
|---|
| 161 | value of the optional \var{restval} parameter. Any other optional or
|
|---|
| 162 | keyword arguments are passed to the underlying \class{reader} instance.
|
|---|
| 163 | \end{classdesc}
|
|---|
| 164 |
|
|---|
| 165 |
|
|---|
| 166 | \begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{,
|
|---|
| 167 | restval=""\optional{,
|
|---|
| 168 | extrasaction=\code{'raise'}\optional{,
|
|---|
| 169 | dialect=\code{'excel'}\optional{,
|
|---|
| 170 | *args, **kwds}}}}}
|
|---|
| 171 | Create an object which operates like a regular writer but maps dictionaries
|
|---|
| 172 | onto output rows. The \var{fieldnames} parameter identifies the order in
|
|---|
| 173 | which values in the dictionary passed to the \method{writerow()} method are
|
|---|
| 174 | written to the \var{csvfile}. The optional \var{restval} parameter
|
|---|
| 175 | specifies the value to be written if the dictionary is missing a key in
|
|---|
| 176 | \var{fieldnames}. If the dictionary passed to the \method{writerow()}
|
|---|
| 177 | method contains a key not found in \var{fieldnames}, the optional
|
|---|
| 178 | \var{extrasaction} parameter indicates what action to take. If it is set
|
|---|
| 179 | to \code{'raise'} a \exception{ValueError} is raised. If it is set to
|
|---|
| 180 | \code{'ignore'}, extra values in the dictionary are ignored. Any other
|
|---|
| 181 | optional or keyword arguments are passed to the underlying \class{writer}
|
|---|
| 182 | instance.
|
|---|
| 183 |
|
|---|
| 184 | Note that unlike the \class{DictReader} class, the \var{fieldnames}
|
|---|
| 185 | parameter of the \class{DictWriter} is not optional. Since Python's
|
|---|
| 186 | \class{dict} objects are not ordered, there is not enough information
|
|---|
| 187 | available to deduce the order in which the row should be written to the
|
|---|
| 188 | \var{csvfile}.
|
|---|
| 189 |
|
|---|
| 190 | \end{classdesc}
|
|---|
| 191 |
|
|---|
| 192 | \begin{classdesc*}{Dialect}{}
|
|---|
| 193 | The \class{Dialect} class is a container class relied on primarily for its
|
|---|
| 194 | attributes, which are used to define the parameters for a specific
|
|---|
| 195 | \class{reader} or \class{writer} instance.
|
|---|
| 196 | \end{classdesc*}
|
|---|
| 197 |
|
|---|
| 198 | \begin{classdesc}{excel}{}
|
|---|
| 199 | The \class{excel} class defines the usual properties of an Excel-generated
|
|---|
| 200 | CSV file.
|
|---|
| 201 | \end{classdesc}
|
|---|
| 202 |
|
|---|
| 203 | \begin{classdesc}{excel_tab}{}
|
|---|
| 204 | The \class{excel_tab} class defines the usual properties of an
|
|---|
| 205 | Excel-generated TAB-delimited file.
|
|---|
| 206 | \end{classdesc}
|
|---|
| 207 |
|
|---|
| 208 | \begin{classdesc}{Sniffer}{}
|
|---|
| 209 | The \class{Sniffer} class is used to deduce the format of a CSV file.
|
|---|
| 210 | \end{classdesc}
|
|---|
| 211 |
|
|---|
| 212 | The \class{Sniffer} class provides two methods:
|
|---|
| 213 |
|
|---|
| 214 | \begin{methoddesc}{sniff}{sample\optional{,delimiters=None}}
|
|---|
| 215 | Analyze the given \var{sample} and return a \class{Dialect} subclass
|
|---|
| 216 | reflecting the parameters found. If the optional \var{delimiters} parameter
|
|---|
| 217 | is given, it is interpreted as a string containing possible valid delimiter
|
|---|
| 218 | characters.
|
|---|
| 219 | \end{methoddesc}
|
|---|
| 220 |
|
|---|
| 221 | \begin{methoddesc}{has_header}{sample}
|
|---|
| 222 | Analyze the sample text (presumed to be in CSV format) and return
|
|---|
| 223 | \constant{True} if the first row appears to be a series of column
|
|---|
| 224 | headers.
|
|---|
| 225 | \end{methoddesc}
|
|---|
| 226 |
|
|---|
| 227 |
|
|---|
| 228 | The \module{csv} module defines the following constants:
|
|---|
| 229 |
|
|---|
| 230 | \begin{datadesc}{QUOTE_ALL}
|
|---|
| 231 | Instructs \class{writer} objects to quote all fields.
|
|---|
| 232 | \end{datadesc}
|
|---|
| 233 |
|
|---|
| 234 | \begin{datadesc}{QUOTE_MINIMAL}
|
|---|
| 235 | Instructs \class{writer} objects to only quote those fields which contain
|
|---|
| 236 | special characters such as \var{delimiter}, \var{quotechar} or any of the
|
|---|
| 237 | characters in \var{lineterminator}.
|
|---|
| 238 | \end{datadesc}
|
|---|
| 239 |
|
|---|
| 240 | \begin{datadesc}{QUOTE_NONNUMERIC}
|
|---|
| 241 | Instructs \class{writer} objects to quote all non-numeric
|
|---|
| 242 | fields.
|
|---|
| 243 |
|
|---|
| 244 | Instructs the reader to convert all non-quoted fields to type \var{float}.
|
|---|
| 245 | \end{datadesc}
|
|---|
| 246 |
|
|---|
| 247 | \begin{datadesc}{QUOTE_NONE}
|
|---|
| 248 | Instructs \class{writer} objects to never quote fields. When the current
|
|---|
| 249 | \var{delimiter} occurs in output data it is preceded by the current
|
|---|
| 250 | \var{escapechar} character. If \var{escapechar} is not set, the writer
|
|---|
| 251 | will raise \exception{Error} if any characters that require escaping
|
|---|
| 252 | are encountered.
|
|---|
| 253 |
|
|---|
| 254 | Instructs \class{reader} to perform no special processing of quote characters.
|
|---|
| 255 | \end{datadesc}
|
|---|
| 256 |
|
|---|
| 257 |
|
|---|
| 258 | The \module{csv} module defines the following exception:
|
|---|
| 259 |
|
|---|
| 260 | \begin{excdesc}{Error}
|
|---|
| 261 | Raised by any of the functions when an error is detected.
|
|---|
| 262 | \end{excdesc}
|
|---|
| 263 |
|
|---|
| 264 |
|
|---|
| 265 | \subsection{Dialects and Formatting Parameters\label{csv-fmt-params}}
|
|---|
| 266 |
|
|---|
| 267 | To make it easier to specify the format of input and output records,
|
|---|
| 268 | specific formatting parameters are grouped together into dialects. A
|
|---|
| 269 | dialect is a subclass of the \class{Dialect} class having a set of specific
|
|---|
| 270 | methods and a single \method{validate()} method. When creating \class{reader}
|
|---|
| 271 | or \class{writer} objects, the programmer can specify a string or a subclass
|
|---|
| 272 | of the \class{Dialect} class as the dialect parameter. In addition to, or
|
|---|
| 273 | instead of, the \var{dialect} parameter, the programmer can also specify
|
|---|
| 274 | individual formatting parameters, which have the same names as the
|
|---|
| 275 | attributes defined below for the \class{Dialect} class.
|
|---|
| 276 |
|
|---|
| 277 | Dialects support the following attributes:
|
|---|
| 278 |
|
|---|
| 279 | \begin{memberdesc}[Dialect]{delimiter}
|
|---|
| 280 | A one-character string used to separate fields. It defaults to \code{','}.
|
|---|
| 281 | \end{memberdesc}
|
|---|
| 282 |
|
|---|
| 283 | \begin{memberdesc}[Dialect]{doublequote}
|
|---|
| 284 | Controls how instances of \var{quotechar} appearing inside a field should
|
|---|
| 285 | be themselves be quoted. When \constant{True}, the character is doubled.
|
|---|
| 286 | When \constant{False}, the \var{escapechar} is used as a prefix to the
|
|---|
| 287 | \var{quotechar}. It defaults to \constant{True}.
|
|---|
| 288 |
|
|---|
| 289 | On output, if \var{doublequote} is \constant{False} and no
|
|---|
| 290 | \var{escapechar} is set, \exception{Error} is raised if a \var{quotechar}
|
|---|
| 291 | is found in a field.
|
|---|
| 292 | \end{memberdesc}
|
|---|
| 293 |
|
|---|
| 294 | \begin{memberdesc}[Dialect]{escapechar}
|
|---|
| 295 | A one-character string used by the writer to escape the \var{delimiter} if
|
|---|
| 296 | \var{quoting} is set to \constant{QUOTE_NONE} and the \var{quotechar}
|
|---|
| 297 | if \var{doublequote} is \constant{False}. On reading, the \var{escapechar}
|
|---|
| 298 | removes any special meaning from the following character. It defaults
|
|---|
| 299 | to \constant{None}, which disables escaping.
|
|---|
| 300 | \end{memberdesc}
|
|---|
| 301 |
|
|---|
| 302 | \begin{memberdesc}[Dialect]{lineterminator}
|
|---|
| 303 | The string used to terminate lines produced by the \class{writer}.
|
|---|
| 304 | It defaults to \code{'\e r\e n'}.
|
|---|
| 305 |
|
|---|
| 306 | \note{The \class{reader} is hard-coded to recognise either \code{'\e r'}
|
|---|
| 307 | or \code{'\e n'} as end-of-line, and ignores \var{lineterminator}. This
|
|---|
| 308 | behavior may change in the future.}
|
|---|
| 309 | \end{memberdesc}
|
|---|
| 310 |
|
|---|
| 311 | \begin{memberdesc}[Dialect]{quotechar}
|
|---|
| 312 | A one-character string used to quote fields containing special characters,
|
|---|
| 313 | such as the \var{delimiter} or \var{quotechar}, or which contain new-line
|
|---|
| 314 | characters. It defaults to \code{'"'}.
|
|---|
| 315 | \end{memberdesc}
|
|---|
| 316 |
|
|---|
| 317 | \begin{memberdesc}[Dialect]{quoting}
|
|---|
| 318 | Controls when quotes should be generated by the writer and recognised
|
|---|
| 319 | by the reader. It can take on any of the \constant{QUOTE_*} constants
|
|---|
| 320 | (see section~\ref{csv-contents}) and defaults to \constant{QUOTE_MINIMAL}.
|
|---|
| 321 | \end{memberdesc}
|
|---|
| 322 |
|
|---|
| 323 | \begin{memberdesc}[Dialect]{skipinitialspace}
|
|---|
| 324 | When \constant{True}, whitespace immediately following the \var{delimiter}
|
|---|
| 325 | is ignored. The default is \constant{False}.
|
|---|
| 326 | \end{memberdesc}
|
|---|
| 327 |
|
|---|
| 328 |
|
|---|
| 329 | \subsection{Reader Objects}
|
|---|
| 330 |
|
|---|
| 331 | Reader objects (\class{DictReader} instances and objects returned by
|
|---|
| 332 | the \function{reader()} function) have the following public methods:
|
|---|
| 333 |
|
|---|
| 334 | \begin{methoddesc}[csv reader]{next}{}
|
|---|
| 335 | Return the next row of the reader's iterable object as a list, parsed
|
|---|
| 336 | according to the current dialect.
|
|---|
| 337 | \end{methoddesc}
|
|---|
| 338 |
|
|---|
| 339 | Reader objects have the following public attributes:
|
|---|
| 340 |
|
|---|
| 341 | \begin{memberdesc}[csv reader]{dialect}
|
|---|
| 342 | A read-only description of the dialect in use by the parser.
|
|---|
| 343 | \end{memberdesc}
|
|---|
| 344 |
|
|---|
| 345 | \begin{memberdesc}[csv reader]{line_num}
|
|---|
| 346 | The number of lines read from the source iterator. This is not the same
|
|---|
| 347 | as the number of records returned, as records can span multiple lines.
|
|---|
| 348 | \end{memberdesc}
|
|---|
| 349 |
|
|---|
| 350 |
|
|---|
| 351 | \subsection{Writer Objects}
|
|---|
| 352 |
|
|---|
| 353 | \class{Writer} objects (\class{DictWriter} instances and objects returned by
|
|---|
| 354 | the \function{writer()} function) have the following public methods. A
|
|---|
| 355 | {}\var{row} must be a sequence of strings or numbers for \class{Writer}
|
|---|
| 356 | objects and a dictionary mapping fieldnames to strings or numbers (by
|
|---|
| 357 | passing them through \function{str()} first) for {}\class{DictWriter}
|
|---|
| 358 | objects. Note that complex numbers are written out surrounded by parens.
|
|---|
| 359 | This may cause some problems for other programs which read CSV files
|
|---|
| 360 | (assuming they support complex numbers at all).
|
|---|
| 361 |
|
|---|
| 362 | \begin{methoddesc}[csv writer]{writerow}{row}
|
|---|
| 363 | Write the \var{row} parameter to the writer's file object, formatted
|
|---|
| 364 | according to the current dialect.
|
|---|
| 365 | \end{methoddesc}
|
|---|
| 366 |
|
|---|
| 367 | \begin{methoddesc}[csv writer]{writerows}{rows}
|
|---|
| 368 | Write all the \var{rows} parameters (a list of \var{row} objects as
|
|---|
| 369 | described above) to the writer's file object, formatted
|
|---|
| 370 | according to the current dialect.
|
|---|
| 371 | \end{methoddesc}
|
|---|
| 372 |
|
|---|
| 373 | Writer objects have the following public attribute:
|
|---|
| 374 |
|
|---|
| 375 | \begin{memberdesc}[csv writer]{dialect}
|
|---|
| 376 | A read-only description of the dialect in use by the writer.
|
|---|
| 377 | \end{memberdesc}
|
|---|
| 378 |
|
|---|
| 379 |
|
|---|
| 380 |
|
|---|
| 381 | \subsection{Examples\label{csv-examples}}
|
|---|
| 382 |
|
|---|
| 383 | The simplest example of reading a CSV file:
|
|---|
| 384 |
|
|---|
| 385 | \begin{verbatim}
|
|---|
| 386 | import csv
|
|---|
| 387 | reader = csv.reader(open("some.csv", "rb"))
|
|---|
| 388 | for row in reader:
|
|---|
| 389 | print row
|
|---|
| 390 | \end{verbatim}
|
|---|
| 391 |
|
|---|
| 392 | Reading a file with an alternate format:
|
|---|
| 393 |
|
|---|
| 394 | \begin{verbatim}
|
|---|
| 395 | import csv
|
|---|
| 396 | reader = csv.reader(open("passwd", "rb"), delimiter=':', quoting=csv.QUOTE_NONE)
|
|---|
| 397 | for row in reader:
|
|---|
| 398 | print row
|
|---|
| 399 | \end{verbatim}
|
|---|
| 400 |
|
|---|
| 401 | The corresponding simplest possible writing example is:
|
|---|
| 402 |
|
|---|
| 403 | \begin{verbatim}
|
|---|
| 404 | import csv
|
|---|
| 405 | writer = csv.writer(open("some.csv", "wb"))
|
|---|
| 406 | writer.writerows(someiterable)
|
|---|
| 407 | \end{verbatim}
|
|---|
| 408 |
|
|---|
| 409 | Registering a new dialect:
|
|---|
| 410 |
|
|---|
| 411 | \begin{verbatim}
|
|---|
| 412 | import csv
|
|---|
| 413 |
|
|---|
| 414 | csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
|
|---|
| 415 |
|
|---|
| 416 | reader = csv.reader(open("passwd", "rb"), 'unixpwd')
|
|---|
| 417 | \end{verbatim}
|
|---|
| 418 |
|
|---|
| 419 | A slightly more advanced use of the reader --- catching and reporting errors:
|
|---|
| 420 |
|
|---|
| 421 | \begin{verbatim}
|
|---|
| 422 | import csv, sys
|
|---|
| 423 | filename = "some.csv"
|
|---|
| 424 | reader = csv.reader(open(filename, "rb"))
|
|---|
| 425 | try:
|
|---|
| 426 | for row in reader:
|
|---|
| 427 | print row
|
|---|
| 428 | except csv.Error, e:
|
|---|
| 429 | sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
|
|---|
| 430 | \end{verbatim}
|
|---|
| 431 |
|
|---|
| 432 | And while the module doesn't directly support parsing strings, it can
|
|---|
| 433 | easily be done:
|
|---|
| 434 |
|
|---|
| 435 | \begin{verbatim}
|
|---|
| 436 | import csv
|
|---|
| 437 | for row in csv.reader(['one,two,three']):
|
|---|
| 438 | print row
|
|---|
| 439 | \end{verbatim}
|
|---|
| 440 |
|
|---|
| 441 | The \module{csv} module doesn't directly support reading and writing
|
|---|
| 442 | Unicode, but it is 8-bit-clean save for some problems with \ASCII{} NUL
|
|---|
| 443 | characters. So you can write functions or classes that handle the
|
|---|
| 444 | encoding and decoding for you as long as you avoid encodings like
|
|---|
| 445 | UTF-16 that use NULs. UTF-8 is recommended.
|
|---|
| 446 |
|
|---|
| 447 | \function{unicode_csv_reader} below is a generator that wraps
|
|---|
| 448 | \class{csv.reader} to handle Unicode CSV data (a list of Unicode
|
|---|
| 449 | strings). \function{utf_8_encoder} is a generator that encodes the
|
|---|
| 450 | Unicode strings as UTF-8, one string (or row) at a time. The encoded
|
|---|
| 451 | strings are parsed by the CSV reader, and
|
|---|
| 452 | \function{unicode_csv_reader} decodes the UTF-8-encoded cells back
|
|---|
| 453 | into Unicode:
|
|---|
| 454 |
|
|---|
| 455 | \begin{verbatim}
|
|---|
| 456 | import csv
|
|---|
| 457 |
|
|---|
| 458 | def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
|
|---|
| 459 | # csv.py doesn't do Unicode; encode temporarily as UTF-8:
|
|---|
| 460 | csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
|
|---|
| 461 | dialect=dialect, **kwargs)
|
|---|
| 462 | for row in csv_reader:
|
|---|
| 463 | # decode UTF-8 back to Unicode, cell by cell:
|
|---|
| 464 | yield [unicode(cell, 'utf-8') for cell in row]
|
|---|
| 465 |
|
|---|
| 466 | def utf_8_encoder(unicode_csv_data):
|
|---|
| 467 | for line in unicode_csv_data:
|
|---|
| 468 | yield line.encode('utf-8')
|
|---|
| 469 | \end{verbatim}
|
|---|
| 470 |
|
|---|
| 471 | For all other encodings the following \class{UnicodeReader} and
|
|---|
| 472 | \class{UnicodeWriter} classes can be used. They take an additional
|
|---|
| 473 | \var{encoding} parameter in their constructor and make sure that the data
|
|---|
| 474 | passes the real reader or writer encoded as UTF-8:
|
|---|
| 475 |
|
|---|
| 476 | \begin{verbatim}
|
|---|
| 477 | import csv, codecs, cStringIO
|
|---|
| 478 |
|
|---|
| 479 | class UTF8Recoder:
|
|---|
| 480 | """
|
|---|
| 481 | Iterator that reads an encoded stream and reencodes the input to UTF-8
|
|---|
| 482 | """
|
|---|
| 483 | def __init__(self, f, encoding):
|
|---|
| 484 | self.reader = codecs.getreader(encoding)(f)
|
|---|
| 485 |
|
|---|
| 486 | def __iter__(self):
|
|---|
| 487 | return self
|
|---|
| 488 |
|
|---|
| 489 | def next(self):
|
|---|
| 490 | return self.reader.next().encode("utf-8")
|
|---|
| 491 |
|
|---|
| 492 | class UnicodeReader:
|
|---|
| 493 | """
|
|---|
| 494 | A CSV reader which will iterate over lines in the CSV file "f",
|
|---|
| 495 | which is encoded in the given encoding.
|
|---|
| 496 | """
|
|---|
| 497 |
|
|---|
| 498 | def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
|
|---|
| 499 | f = UTF8Recoder(f, encoding)
|
|---|
| 500 | self.reader = csv.reader(f, dialect=dialect, **kwds)
|
|---|
| 501 |
|
|---|
| 502 | def next(self):
|
|---|
| 503 | row = self.reader.next()
|
|---|
| 504 | return [unicode(s, "utf-8") for s in row]
|
|---|
| 505 |
|
|---|
| 506 | def __iter__(self):
|
|---|
| 507 | return self
|
|---|
| 508 |
|
|---|
| 509 | class UnicodeWriter:
|
|---|
| 510 | """
|
|---|
| 511 | A CSV writer which will write rows to CSV file "f",
|
|---|
| 512 | which is encoded in the given encoding.
|
|---|
| 513 | """
|
|---|
| 514 |
|
|---|
| 515 | def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
|
|---|
| 516 | # Redirect output to a queue
|
|---|
| 517 | self.queue = cStringIO.StringIO()
|
|---|
| 518 | self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
|
|---|
| 519 | self.stream = f
|
|---|
| 520 | self.encoder = codecs.getincrementalencoder(encoding)()
|
|---|
| 521 |
|
|---|
| 522 | def writerow(self, row):
|
|---|
| 523 | self.writer.writerow([s.encode("utf-8") for s in row])
|
|---|
| 524 | # Fetch UTF-8 output from the queue ...
|
|---|
| 525 | data = self.queue.getvalue()
|
|---|
| 526 | data = data.decode("utf-8")
|
|---|
| 527 | # ... and reencode it into the target encoding
|
|---|
| 528 | data = self.encoder.encode(data)
|
|---|
| 529 | # write to the target stream
|
|---|
| 530 | self.stream.write(data)
|
|---|
| 531 | # empty queue
|
|---|
| 532 | self.queue.truncate(0)
|
|---|
| 533 |
|
|---|
| 534 | def writerows(self, rows):
|
|---|
| 535 | for row in rows:
|
|---|
| 536 | self.writerow(row)
|
|---|
| 537 | \end{verbatim}
|
|---|