Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

ref2.tex@ 3391

Visit:

Last change on this file since 3391 was 3225, checked in by bird, 19 years ago
Python 2.5
File size: 28.1 KB

Line
1	\chapter{Lexical analysis\label{lexical}}
2
3	A Python program is read by a \emph{parser}. Input to the parser is a
4	stream of \emph{tokens}, generated by the \emph{lexical analyzer}. This
5	chapter describes how the lexical analyzer breaks a file into tokens.
6	\index{lexical analysis}
7	\index{parser}
8	\index{token}
9
10	Python uses the 7-bit \ASCII{} character set for program text.
11	\versionadded[An encoding declaration can be used to indicate that
12	string literals and comments use an encoding different from ASCII]{2.3}
13	For compatibility with older versions, Python only warns if it finds
14	8-bit characters; those warnings should be corrected by either declaring
15	an explicit encoding, or using escape sequences if those bytes are binary
16	data, instead of characters.
17
18
19	The run-time character set depends on the I/O devices connected to the
20	program but is generally a superset of \ASCII.
21
22	\strong{Future compatibility note:} It may be tempting to assume that the
23	character set for 8-bit characters is ISO Latin-1 (an \ASCII{}
24	superset that covers most western languages that use the Latin
25	alphabet), but it is possible that in the future Unicode text editors
26	will become common. These generally use the UTF-8 encoding, which is
27	also an \ASCII{} superset, but with very different use for the
28	characters with ordinals 128-255. While there is no consensus on this
29	subject yet, it is unwise to assume either Latin-1 or UTF-8, even
30	though the current implementation appears to favor Latin-1. This
31	applies both to the source character set and the run-time character
32	set.
33
34
35	\section{Line structure\label{line-structure}}
36
37	A Python program is divided into a number of \emph{logical lines}.
38	\index{line structure}
39
40
41	\subsection{Logical lines\label{logical}}
42
43	The end of
44	a logical line is represented by the token NEWLINE. Statements cannot
45	cross logical line boundaries except where NEWLINE is allowed by the
46	syntax (e.g., between statements in compound statements).
47	A logical line is constructed from one or more \emph{physical lines}
48	by following the explicit or implicit \emph{line joining} rules.
49	\index{logical line}
50	\index{physical line}
51	\index{line joining}
52	\index{NEWLINE token}
53
54
55	\subsection{Physical lines\label{physical}}
56
57	A physical line is a sequence of characters terminated by an end-of-line
58	sequence. In source files, any of the standard platform line
59	termination sequences can be used - the \UNIX{} form using \ASCII{} LF
60	(linefeed), the Windows form using the \ASCII{} sequence CR LF (return
61	followed by linefeed), or the Macintosh form using the \ASCII{} CR
62	(return) character. All of these forms can be used equally, regardless
63	of platform.
64
65	When embedding Python, source code strings should be passed to Python
66	APIs using the standard C conventions for newline characters (the
67	\code{\e n} character, representing \ASCII{} LF, is the line
68	terminator).
69
70
71	\subsection{Comments\label{comments}}
72
73	A comment starts with a hash character (\code{\#}) that is not part of
74	a string literal, and ends at the end of the physical line. A comment
75	signifies the end of the logical line unless the implicit line joining
76	rules are invoked.
77	Comments are ignored by the syntax; they are not tokens.
78	\index{comment}
79	\index{hash character}
80
81
82	\subsection{Encoding declarations\label{encodings}}
83	\index{source character set}
84	\index{encodings}
85
86	If a comment in the first or second line of the Python script matches
87	the regular expression \regexp{coding[=:]\e s*([-\e w.]+)}, this comment is
88	processed as an encoding declaration; the first group of this
89	expression names the encoding of the source code file. The recommended
90	forms of this expression are
91
92	\begin{verbatim}
93	# -- coding: <encoding-name> --
94	\end{verbatim}
95
96	which is recognized also by GNU Emacs, and
97
98	\begin{verbatim}
99	# vim:fileencoding=<encoding-name>
100	\end{verbatim}
101
102	which is recognized by Bram Moolenaar's VIM. In addition, if the first
103	bytes of the file are the UTF-8 byte-order mark
104	(\code{'\e xef\e xbb\e xbf'}), the declared file encoding is UTF-8
105	(this is supported, among others, by Microsoft's \program{notepad}).
106
107	If an encoding is declared, the encoding name must be recognized by
108	Python. % XXX there should be a list of supported encodings.
109	The encoding is used for all lexical analysis, in particular to find
110	the end of a string, and to interpret the contents of Unicode literals.
111	String literals are converted to Unicode for syntactical analysis,
112	then converted back to their original encoding before interpretation
113	starts. The encoding declaration must appear on a line of its own.
114
115	\subsection{Explicit line joining\label{explicit-joining}}
116
117	Two or more physical lines may be joined into logical lines using
118	backslash characters (\code{\e}), as follows: when a physical line ends
119	in a backslash that is not part of a string literal or comment, it is
120	joined with the following forming a single logical line, deleting the
121	backslash and the following end-of-line character. For example:
122	\index{physical line}
123	\index{line joining}
124	\index{line continuation}
125	\index{backslash character}
126	%
127	\begin{verbatim}
128	if 1900 < year < 2100 and 1 <= month <= 12 \
129	and 1 <= day <= 31 and 0 <= hour < 24 \
130	and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date
131	return 1
132	\end{verbatim}
133
134	A line ending in a backslash cannot carry a comment. A backslash does
135	not continue a comment. A backslash does not continue a token except
136	for string literals (i.e., tokens other than string literals cannot be
137	split across physical lines using a backslash). A backslash is
138	illegal elsewhere on a line outside a string literal.
139
140
141	\subsection{Implicit line joining\label{implicit-joining}}
142
143	Expressions in parentheses, square brackets or curly braces can be
144	split over more than one physical line without using backslashes.
145	For example:
146
147	\begin{verbatim}
148	month_names = ['Januari', 'Februari', 'Maart', # These are the
149	'April', 'Mei', 'Juni', # Dutch names
150	'Juli', 'Augustus', 'September', # for the months
151	'Oktober', 'November', 'December'] # of the year
152	\end{verbatim}
153
154	Implicitly continued lines can carry comments. The indentation of the
155	continuation lines is not important. Blank continuation lines are
156	allowed. There is no NEWLINE token between implicit continuation
157	lines. Implicitly continued lines can also occur within triple-quoted
158	strings (see below); in that case they cannot carry comments.
159
160
161	\subsection{Blank lines \label{blank-lines}}
162
163	\index{blank line}
164	A logical line that contains only spaces, tabs, formfeeds and possibly
165	a comment, is ignored (i.e., no NEWLINE token is generated). During
166	interactive input of statements, handling of a blank line may differ
167	depending on the implementation of the read-eval-print loop. In the
168	standard implementation, an entirely blank logical line (i.e.\ one
169	containing not even whitespace or a comment) terminates a multi-line
170	statement.
171
172
173	\subsection{Indentation\label{indentation}}
174
175	Leading whitespace (spaces and tabs) at the beginning of a logical
176	line is used to compute the indentation level of the line, which in
177	turn is used to determine the grouping of statements.
178	\index{indentation}
179	\index{whitespace}
180	\index{leading whitespace}
181	\index{space}
182	\index{tab}
183	\index{grouping}
184	\index{statement grouping}
185
186	First, tabs are replaced (from left to right) by one to eight spaces
187	such that the total number of characters up to and including the
188	replacement is a multiple of
189	eight (this is intended to be the same rule as used by \UNIX). The
190	total number of spaces preceding the first non-blank character then
191	determines the line's indentation. Indentation cannot be split over
192	multiple physical lines using backslashes; the whitespace up to the
193	first backslash determines the indentation.
194
195	\strong{Cross-platform compatibility note:} because of the nature of
196	text editors on non-UNIX platforms, it is unwise to use a mixture of
197	spaces and tabs for the indentation in a single source file. It
198	should also be noted that different platforms may explicitly limit the
199	maximum indentation level.
200
201	A formfeed character may be present at the start of the line; it will
202	be ignored for the indentation calculations above. Formfeed
203	characters occurring elsewhere in the leading whitespace have an
204	undefined effect (for instance, they may reset the space count to
205	zero).
206
207	The indentation levels of consecutive lines are used to generate
208	INDENT and DEDENT tokens, using a stack, as follows.
209	\index{INDENT token}
210	\index{DEDENT token}
211
212	Before the first line of the file is read, a single zero is pushed on
213	the stack; this will never be popped off again. The numbers pushed on
214	the stack will always be strictly increasing from bottom to top. At
215	the beginning of each logical line, the line's indentation level is
216	compared to the top of the stack. If it is equal, nothing happens.
217	If it is larger, it is pushed on the stack, and one INDENT token is
218	generated. If it is smaller, it \emph{must} be one of the numbers
219	occurring on the stack; all numbers on the stack that are larger are
220	popped off, and for each number popped off a DEDENT token is
221	generated. At the end of the file, a DEDENT token is generated for
222	each number remaining on the stack that is larger than zero.
223
224	Here is an example of a correctly (though confusingly) indented piece
225	of Python code:
226
227	\begin{verbatim}
228	def perm(l):
229	# Compute the list of all permutations of l
230	if len(l) <= 1:
231	return [l]
232	r = []
233	for i in range(len(l)):
234	s = l[:i] + l[i+1:]
235	p = perm(s)
236	for x in p:
237	r.append(l[i:i+1] + x)
238	return r
239	\end{verbatim}
240
241	The following example shows various indentation errors:
242
243	\begin{verbatim}
244	def perm(l): # error: first line indented
245	for i in range(len(l)): # error: not indented
246	s = l[:i] + l[i+1:]
247	p = perm(l[:i] + l[i+1:]) # error: unexpected indent
248	for x in p:
249	r.append(l[i:i+1] + x)
250	return r # error: inconsistent dedent
251	\end{verbatim}
252
253	(Actually, the first three errors are detected by the parser; only the
254	last error is found by the lexical analyzer --- the indentation of
255	\code{return r} does not match a level popped off the stack.)
256
257
258	\subsection{Whitespace between tokens\label{whitespace}}
259
260	Except at the beginning of a logical line or in string literals, the
261	whitespace characters space, tab and formfeed can be used
262	interchangeably to separate tokens. Whitespace is needed between two
263	tokens only if their concatenation could otherwise be interpreted as a
264	different token (e.g., ab is one token, but a b is two tokens).
265
266
267	\section{Other tokens\label{other-tokens}}
268
269	Besides NEWLINE, INDENT and DEDENT, the following categories of tokens
270	exist: \emph{identifiers}, \emph{keywords}, \emph{literals},
271	\emph{operators}, and \emph{delimiters}.
272	Whitespace characters (other than line terminators, discussed earlier)
273	are not tokens, but serve to delimit tokens.
274	Where
275	ambiguity exists, a token comprises the longest possible string that
276	forms a legal token, when read from left to right.
277
278
279	\section{Identifiers and keywords\label{identifiers}}
280
281	Identifiers (also referred to as \emph{names}) are described by the following
282	lexical definitions:
283	\index{identifier}
284	\index{name}
285
286	\begin{productionlist}
287	\production{identifier}
288	{(\token{letter}\|"_") (\token{letter} \| \token{digit} \| "_")*}
289	\production{letter}
290	{\token{lowercase} \| \token{uppercase}}
291	\production{lowercase}
292	{"a"..."z"}
293	\production{uppercase}
294	{"A"..."Z"}
295	\production{digit}
296	{"0"..."9"}
297	\end{productionlist}
298
299	Identifiers are unlimited in length. Case is significant.
300
301
302	\subsection{Keywords\label{keywords}}
303
304	The following identifiers are used as reserved words, or
305	\emph{keywords} of the language, and cannot be used as ordinary
306	identifiers. They must be spelled exactly as written here:%
307	\index{keyword}%
308	\index{reserved word}
309
310	\begin{verbatim}
311	and del from not while
312	as elif global or with
313	assert else if pass yield
314	break except import print
315	class exec in raise
316	continue finally is return
317	def for lambda try
318	\end{verbatim}
319
320	% When adding keywords, use reswords.py for reformatting
321
322	\versionchanged[\constant{None} became a constant and is now
323	recognized by the compiler as a name for the built-in object
324	\constant{None}. Although it is not a keyword, you cannot assign
325	a different object to it]{2.4}
326
327	\versionchanged[Both \keyword{as} and \keyword{with} are only recognized
328	when the \code{with_statement} future feature has been enabled.
329	It will always be enabled in Python 2.6. See section~\ref{with} for
330	details. Note that using \keyword{as} and \keyword{with} as identifiers
331	will always issue a warning, even when the \code{with_statement} future
332	directive is not in effect]{2.5}
333
334
335	\subsection{Reserved classes of identifiers\label{id-classes}}
336
337	Certain classes of identifiers (besides keywords) have special
338	meanings. These classes are identified by the patterns of leading and
339	trailing underscore characters:
340
341	\begin{description}
342
343	\item[\code{_*}]
344	Not imported by \samp{from \var{module} import *}. The special
345	identifier \samp{_} is used in the interactive interpreter to store
346	the result of the last evaluation; it is stored in the
347	\module{__builtin__} module. When not in interactive mode, \samp{_}
348	has no special meaning and is not defined.
349	See section~\ref{import}, ``The \keyword{import} statement.''
350
351	\note{The name \samp{_} is often used in conjunction with
352	internationalization; refer to the documentation for the
353	\ulink{\module{gettext} module}{../lib/module-gettext.html} for more
354	information on this convention.}
355
356	\item[\code{__*__}]
357	System-defined names. These names are defined by the interpreter
358	and its implementation (including the standard library);
359	applications should not expect to define additional names using this
360	convention. The set of names of this class defined by Python may be
361	extended in future versions.
362	See section~\ref{specialnames}, ``Special method names.''
363
364	\item[\code{__*}]
365	Class-private names. Names in this category, when used within the
366	context of a class definition, are re-written to use a mangled form
367	to help avoid name clashes between ``private'' attributes of base
368	and derived classes.
369	See section~\ref{atom-identifiers}, ``Identifiers (Names).''
370
371	\end{description}
372
373
374	\section{Literals\label{literals}}
375
376	Literals are notations for constant values of some built-in types.
377	\index{literal}
378	\index{constant}
379
380
381	\subsection{String literals\label{strings}}
382
383	String literals are described by the following lexical definitions:
384	\index{string literal}
385
386	\index{ASCII@\ASCII}
387	\begin{productionlist}
388	\production{stringliteral}
389	{[\token{stringprefix}](\token{shortstring} \| \token{longstring})}
390	\production{stringprefix}
391	{"r" \| "u" \| "ur" \| "R" \| "U" \| "UR" \| "Ur" \| "uR"}
392	\production{shortstring}
393	{"'" \token{shortstringitem}* "'"
394	\| '"' \token{shortstringitem}* '"'}
395	\production{longstring}
396	{"'''" \token{longstringitem}* "'''"}
397	\productioncont{\| '"""' \token{longstringitem}* '"""'}
398	\production{shortstringitem}
399	{\token{shortstringchar} \| \token{escapeseq}}
400	\production{longstringitem}
401	{\token{longstringchar} \| \token{escapeseq}}
402	\production{shortstringchar}
403	{<any source character except "\e" or newline or the quote>}
404	\production{longstringchar}
405	{<any source character except "\e">}
406	\production{escapeseq}
407	{"\e" <any ASCII character>}
408	\end{productionlist}
409
410	One syntactic restriction not indicated by these productions is that
411	whitespace is not allowed between the \grammartoken{stringprefix} and
412	the rest of the string literal. The source character set is defined
413	by the encoding declaration; it is \ASCII{} if no encoding declaration
414	is given in the source file; see section~\ref{encodings}.
415
416	\index{triple-quoted string}
417	\index{Unicode Consortium}
418	\index{string!Unicode}
419	In plain English: String literals can be enclosed in matching single
420	quotes (\code{'}) or double quotes (\code{"}). They can also be
421	enclosed in matching groups of three single or double quotes (these
422	are generally referred to as \emph{triple-quoted strings}). The
423	backslash (\code{\e}) character is used to escape characters that
424	otherwise have a special meaning, such as newline, backslash itself,
425	or the quote character. String literals may optionally be prefixed
426	with a letter \character{r} or \character{R}; such strings are called
427	\dfn{raw strings}\index{raw string} and use different rules for interpreting
428	backslash escape sequences. A prefix of \character{u} or \character{U}
429	makes the string a Unicode string. Unicode strings use the Unicode character
430	set as defined by the Unicode Consortium and ISO~10646. Some additional
431	escape sequences, described below, are available in Unicode strings.
432	The two prefix characters may be combined; in this case, \character{u} must
433	appear before \character{r}.
434
435	In triple-quoted strings,
436	unescaped newlines and quotes are allowed (and are retained), except
437	that three unescaped quotes in a row terminate the string. (A
438	``quote'' is the character used to open the string, i.e. either
439	\code{'} or \code{"}.)
440
441	Unless an \character{r} or \character{R} prefix is present, escape
442	sequences in strings are interpreted according to rules similar
443	to those used by Standard C. The recognized escape sequences are:
444	\index{physical line}
445	\index{escape sequence}
446	\index{Standard C}
447	\index{C}
448
449	\begin{tableiii}{l\|l\|c}{code}{Escape Sequence}{Meaning}{Notes}
450	\lineiii{\e\var{newline}} {Ignored}{}
451	\lineiii{\e\e} {Backslash (\code{\e})}{}
452	\lineiii{\e'} {Single quote (\code{'})}{}
453	\lineiii{\e"} {Double quote (\code{"})}{}
454	\lineiii{\e a} {\ASCII{} Bell (BEL)}{}
455	\lineiii{\e b} {\ASCII{} Backspace (BS)}{}
456	\lineiii{\e f} {\ASCII{} Formfeed (FF)}{}
457	\lineiii{\e n} {\ASCII{} Linefeed (LF)}{}
458	\lineiii{\e N\{\var{name}\}}
459	{Character named \var{name} in the Unicode database (Unicode only)}{}
460	\lineiii{\e r} {\ASCII{} Carriage Return (CR)}{}
461	\lineiii{\e t} {\ASCII{} Horizontal Tab (TAB)}{}
462	\lineiii{\e u\var{xxxx}}
463	{Character with 16-bit hex value \var{xxxx} (Unicode only)}{(1)}
464	\lineiii{\e U\var{xxxxxxxx}}
465	{Character with 32-bit hex value \var{xxxxxxxx} (Unicode only)}{(2)}
466	\lineiii{\e v} {\ASCII{} Vertical Tab (VT)}{}
467	\lineiii{\e\var{ooo}} {Character with octal value \var{ooo}}{(3,5)}
468	\lineiii{\e x\var{hh}} {Character with hex value \var{hh}}{(4,5)}
469	\end{tableiii}
470	\index{ASCII@\ASCII}
471
472	\noindent
473	Notes:
474
475	\begin{itemize}
476	\item[(1)]
477	Individual code units which form parts of a surrogate pair can be
478	encoded using this escape sequence.
479	\item[(2)]
480	Any Unicode character can be encoded this way, but characters
481	outside the Basic Multilingual Plane (BMP) will be encoded using a
482	surrogate pair if Python is compiled to use 16-bit code units (the
483	default). Individual code units which form parts of a surrogate
484	pair can be encoded using this escape sequence.
485	\item[(3)]
486	As in Standard C, up to three octal digits are accepted.
487	\item[(4)]
488	Unlike in Standard C, at most two hex digits are accepted.
489	\item[(5)]
490	In a string literal, hexadecimal and octal escapes denote the
491	byte with the given value; it is not necessary that the byte
492	encodes a character in the source character set. In a Unicode
493	literal, these escapes denote a Unicode character with the given
494	value.
495	\end{itemize}
496
497
498	Unlike Standard \index{unrecognized escape sequence}C,
499	all unrecognized escape sequences are left in the string unchanged,
500	i.e., \emph{the backslash is left in the string}. (This behavior is
501	useful when debugging: if an escape sequence is mistyped, the
502	resulting output is more easily recognized as broken.) It is also
503	important to note that the escape sequences marked as ``(Unicode
504	only)'' in the table above fall into the category of unrecognized
505	escapes for non-Unicode string literals.
506
507	When an \character{r} or \character{R} prefix is present, a character
508	following a backslash is included in the string without change, and \emph{all
509	backslashes are left in the string}. For example, the string literal
510	\code{r"\e n"} consists of two characters: a backslash and a lowercase
511	\character{n}. String quotes can be escaped with a backslash, but the
512	backslash remains in the string; for example, \code{r"\e""} is a valid string
513	literal consisting of two characters: a backslash and a double quote;
514	\code{r"\e"} is not a valid string literal (even a raw string cannot
515	end in an odd number of backslashes). Specifically, \emph{a raw
516	string cannot end in a single backslash} (since the backslash would
517	escape the following quote character). Note also that a single
518	backslash followed by a newline is interpreted as those two characters
519	as part of the string, \emph{not} as a line continuation.
520
521	When an \character{r} or \character{R} prefix is used in conjunction
522	with a \character{u} or \character{U} prefix, then the \code{\e uXXXX}
523	and \code{\e UXXXXXXXX} escape sequences are processed while
524	\emph{all other backslashes are left in the string}.
525	For example, the string literal
526	\code{ur"\e{}u0062\e n"} consists of three Unicode characters: `LATIN
527	SMALL LETTER B', `REVERSE SOLIDUS', and `LATIN SMALL LETTER N'.
528	Backslashes can be escaped with a preceding backslash; however, both
529	remain in the string. As a result, \code{\e uXXXX} escape sequences
530	are only recognized when there are an odd number of backslashes.
531
532	\subsection{String literal concatenation\label{string-catenation}}
533
534	Multiple adjacent string literals (delimited by whitespace), possibly
535	using different quoting conventions, are allowed, and their meaning is
536	the same as their concatenation. Thus, \code{"hello" 'world'} is
537	equivalent to \code{"helloworld"}. This feature can be used to reduce
538	the number of backslashes needed, to split long strings conveniently
539	across long lines, or even to add comments to parts of strings, for
540	example:
541
542	\begin{verbatim}
543	re.compile("[A-Za-z_]" # letter or underscore
544	"[A-Za-z0-9_]*" # letter, digit or underscore
545	)
546	\end{verbatim}
547
548	Note that this feature is defined at the syntactical level, but
549	implemented at compile time. The `+' operator must be used to
550	concatenate string expressions at run time. Also note that literal
551	concatenation can use different quoting styles for each component
552	(even mixing raw strings and triple quoted strings).
553
554
555	\subsection{Numeric literals\label{numbers}}
556
557	There are four types of numeric literals: plain integers, long
558	integers, floating point numbers, and imaginary numbers. There are no
559	complex literals (complex numbers can be formed by adding a real
560	number and an imaginary number).
561	\index{number}
562	\index{numeric literal}
563	\index{integer literal}
564	\index{plain integer literal}
565	\index{long integer literal}
566	\index{floating point literal}
567	\index{hexadecimal literal}
568	\index{octal literal}
569	\index{decimal literal}
570	\index{imaginary literal}
571	\index{complex!literal}
572
573	Note that numeric literals do not include a sign; a phrase like
574	\code{-1} is actually an expression composed of the unary operator
575	`\code{-}' and the literal \code{1}.
576
577
578	\subsection{Integer and long integer literals\label{integers}}
579
580	Integer and long integer literals are described by the following
581	lexical definitions:
582
583	\begin{productionlist}
584	\production{longinteger}
585	{\token{integer} ("l" \| "L")}
586	\production{integer}
587	{\token{decimalinteger} \| \token{octinteger} \| \token{hexinteger}}
588	\production{decimalinteger}
589	{\token{nonzerodigit} \token{digit}* \| "0"}
590	\production{octinteger}
591	{"0" \token{octdigit}+}
592	\production{hexinteger}
593	{"0" ("x" \| "X") \token{hexdigit}+}
594	\production{nonzerodigit}
595	{"1"..."9"}
596	\production{octdigit}
597	{"0"..."7"}
598	\production{hexdigit}
599	{\token{digit} \| "a"..."f" \| "A"..."F"}
600	\end{productionlist}
601
602	Although both lower case \character{l} and upper case \character{L} are
603	allowed as suffix for long integers, it is strongly recommended to always
604	use \character{L}, since the letter \character{l} looks too much like the
605	digit \character{1}.
606
607	Plain integer literals that are above the largest representable plain
608	integer (e.g., 2147483647 when using 32-bit arithmetic) are accepted
609	as if they were long integers instead.\footnote{In versions of Python
610	prior to 2.4, octal and hexadecimal literals in the range just above
611	the largest representable plain integer but below the largest unsigned
612	32-bit number (on a machine using 32-bit arithmetic), 4294967296, were
613	taken as the negative plain integer obtained by subtracting 4294967296
614	from their unsigned value.} There is no limit for long integer
615	literals apart from what can be stored in available memory.
616
617	Some examples of plain integer literals (first row) and long integer
618	literals (second and third rows):
619
620	\begin{verbatim}
621	7 2147483647 0177
622	3L 79228162514264337593543950336L 0377L 0x100000000L
623	79228162514264337593543950336 0xdeadbeef
624	\end{verbatim}
625
626
627	\subsection{Floating point literals\label{floating}}
628
629	Floating point literals are described by the following lexical
630	definitions:
631
632	\begin{productionlist}
633	\production{floatnumber}
634	{\token{pointfloat} \| \token{exponentfloat}}
635	\production{pointfloat}
636	{[\token{intpart}] \token{fraction} \| \token{intpart} "."}
637	\production{exponentfloat}
638	{(\token{intpart} \| \token{pointfloat})
639	\token{exponent}}
640	\production{intpart}
641	{\token{digit}+}
642	\production{fraction}
643	{"." \token{digit}+}
644	\production{exponent}
645	{("e" \| "E") ["+" \| "-"] \token{digit}+}
646	\end{productionlist}
647
648	Note that the integer and exponent parts of floating point numbers
649	can look like octal integers, but are interpreted using radix 10. For
650	example, \samp{077e010} is legal, and denotes the same number
651	as \samp{77e10}.
652	The allowed range of floating point literals is
653	implementation-dependent.
654	Some examples of floating point literals:
655
656	\begin{verbatim}
657	3.14 10. .001 1e100 3.14e-10 0e0
658	\end{verbatim}
659
660	Note that numeric literals do not include a sign; a phrase like
661	\code{-1} is actually an expression composed of the unary operator
662	\code{-} and the literal \code{1}.
663
664
665	\subsection{Imaginary literals\label{imaginary}}
666
667	Imaginary literals are described by the following lexical definitions:
668
669	\begin{productionlist}
670	\production{imagnumber}{(\token{floatnumber} \| \token{intpart}) ("j" \| "J")}
671	\end{productionlist}
672
673	An imaginary literal yields a complex number with a real part of
674	0.0. Complex numbers are represented as a pair of floating point
675	numbers and have the same restrictions on their range. To create a
676	complex number with a nonzero real part, add a floating point number
677	to it, e.g., \code{(3+4j)}. Some examples of imaginary literals:
678
679	\begin{verbatim}
680	3.14j 10.j 10j .001j 1e100j 3.14e-10j
681	\end{verbatim}
682
683
684	\section{Operators\label{operators}}
685
686	The following tokens are operators:
687	\index{operators}
688
689	\begin{verbatim}
690	+ - * ** / // %
691	<< >> & \| ^ ~
692	< > <= >= == != <>
693	\end{verbatim}
694
695	The comparison operators \code{<>} and \code{!=} are alternate
696	spellings of the same operator. \code{!=} is the preferred spelling;
697	\code{<>} is obsolescent.
698
699
700	\section{Delimiters\label{delimiters}}
701
702	The following tokens serve as delimiters in the grammar:
703	\index{delimiters}
704
705	\begin{verbatim}
706	( ) [ ] { } @
707	, : . ` = ;
708	+= -= *= /= //= %=
709	&= \|= ^= >>= <<= **=
710	\end{verbatim}
711
712	The period can also occur in floating-point and imaginary literals. A
713	sequence of three periods has a special meaning as an ellipsis in slices.
714	The second half of the list, the augmented assignment operators, serve
715	lexically as delimiters, but also perform an operation.
716
717	The following printing \ASCII{} characters have special meaning as part
718	of other tokens or are otherwise significant to the lexical analyzer:
719
720	\begin{verbatim}
721	' " # \
722	\end{verbatim}
723
724	The following printing \ASCII{} characters are not used in Python. Their
725	occurrence outside string literals and comments is an unconditional
726	error:
727	\index{ASCII@\ASCII}
728
729	\begin{verbatim}
730	$ ?
731	\end{verbatim}

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: trunk/essentials/dev-lang/python/Doc/ref/ref2.tex@ 3391

Download in other formats: