Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

flex.info-3@ 3032

Visit:

Last change on this file since 3032 was 3031, checked in by bird, 19 years ago
flex 2.5.33.
File size: 49.5 KB

Line
1	This is flex.info, produced by makeinfo version 4.5 from flex.texi.
2
3	INFO-DIR-SECTION Programming
4	START-INFO-DIR-ENTRY
5	* flex: (flex). Fast lexical analyzer generator (lex replacement).
6	END-INFO-DIR-ENTRY
7
8
9	The flex manual is placed under the same licensing conditions as the
10	rest of flex:
11
12	Copyright (C) 1990, 1997 The Regents of the University of California.
13	All rights reserved.
14
15	This code is derived from software contributed to Berkeley by Vern
16	Paxson.
17
18	The United States Government has rights in this work pursuant to
19	contract no. DE-AC03-76SF00098 between the United States Department of
20	Energy and the University of California.
21
22	Redistribution and use in source and binary forms, with or without
23	modification, are permitted provided that the following conditions are
24	met:
25
26	1. Redistributions of source code must retain the above copyright
27	notice, this list of conditions and the following disclaimer.
28
29	2. Redistributions in binary form must reproduce the above copyright
30	notice, this list of conditions and the following disclaimer in the
31	documentation and/or other materials provided with the
32	distribution.
33	Neither the name of the University nor the names of its contributors
34	may be used to endorse or promote products derived from this software
35	without specific prior written permission.
36
37	THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
38	WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
39	MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
40
41	File: flex.info, Node: Debugging Options, Next: Miscellaneous Options, Prev: Options for Scanner Speed and Size, Up: Scanner Options
42
43	Debugging Options
44	=================
45
46	`-b, --backup, `%option backup''
47	Generate backing-up information to `lex.backup'. This is a list of
48	scanner states which require backing up and the input characters on
49	which they do so. By adding rules one can remove backing-up
50	states. If _all_ backing-up states are eliminated and `-Cf' or
51	`-CF' is used, the generated scanner will run faster (see the
52	`--perf-report' flag). Only users who wish to squeeze every last
53	cycle out of their scanners need worry about this option. (*note
54	Performance::).
55
56	`-d, --debug, `%option debug''
57	makes the generated scanner run in "debug" mode. Whenever a
58	pattern is recognized and the global variable `yy_flex_debug' is
59	non-zero (which is the default), the scanner will write to
60	`stderr' a line of the form:
61
62
63	-accepting rule at line 53 ("the matched text")
64
65	The line number refers to the location of the rule in the file
66	defining the scanner (i.e., the file that was fed to flex).
67	Messages are also generated when the scanner backs up, accepts the
68	default rule, reaches the end of its input buffer (or encounters a
69	NUL; at this point, the two look the same as far as the scanner's
70	concerned), or reaches an end-of-file.
71
72	`-p, --perf-report, `%option perf-report''
73	generates a performance report to `stderr'. The report consists of
74	comments regarding features of the `flex' input file which will
75	cause a serious loss of performance in the resulting scanner. If
76	you give the flag twice, you will also get comments regarding
77	features that lead to minor performance losses.
78
79	Note that the use of `REJECT', and variable trailing context
80	(*note Limitations::) entails a substantial performance penalty;
81	use of `yymore()', the `^' operator, and the `--interactive' flag
82	entail minor performance penalties.
83
84	`-s, --nodefault, `%option nodefault''
85	causes the _default rule_ (that unmatched scanner input is echoed
86	to `stdout)' to be suppressed. If the scanner encounters input
87	that does not match any of its rules, it aborts with an error.
88	This option is useful for finding holes in a scanner's rule set.
89
90	`-T, --trace, `%option trace''
91	makes `flex' run in "trace" mode. It will generate a lot of
92	messages to `stderr' concerning the form of the input and the
93	resultant non-deterministic and deterministic finite automata.
94	This option is mostly for use in maintaining `flex'.
95
96	`-w, --nowarn, `%option nowarn''
97	suppresses warning messages.
98
99	`-v, --verbose, `%option verbose''
100	specifies that `flex' should write to `stderr' a summary of
101	statistics regarding the scanner it generates. Most of the
102	statistics are meaningless to the casual `flex' user, but the
103	first line identifies the version of `flex' (same as reported by
104	`--version'), and the next line the flags used when generating the
105	scanner, including those that are on by default.
106
107	`--warn, `%option warn''
108	warn about certain things. In particular, if the default rule can
109	be matched but no defualt rule has been given, the flex will warn
110	you. We recommend using this option always.
111
112
113
114	File: flex.info, Node: Miscellaneous Options, Prev: Debugging Options, Up: Scanner Options
115
116	Miscellaneous Options
117	=====================
118
119	`-c'
120	is a do-nothing option included for POSIX compliance.
121
122	generates
123
124	`-h, -?, --help'
125	generates a "help" summary of `flex''s options to `stdout' and
126	then exits.
127
128	`-n'
129	is another do-nothing option included only for POSIX compliance.
130
131	`-V, --version'
132	prints the version number to `stdout' and exits.
133
134
135
136	File: flex.info, Node: Performance, Next: Cxx, Prev: Scanner Options, Up: Top
137
138	Performance Considerations
139	**************************
140
141	The main design goal of `flex' is that it generate high-performance
142	scanners. It has been optimized for dealing well with large sets of
143	rules. Aside from the effects on scanner speed of the table compression
144	`-C' options outlined above, there are a number of options/actions
145	which degrade performance. These are, from most expensive to least:
146
147
148	REJECT
149	arbitrary trailing context
150
151	pattern sets that require backing up
152	%option yylineno
153	%array
154
155	%option interactive
156	%option always-interactive
157
158	@samp{^} beginning-of-line operator
159	yymore()
160
161	with the first two all being quite expensive and the last two being
162	quite cheap. Note also that `unput()' is implemented as a routine call
163	that potentially does quite a bit of work, while `yyless()' is a
164	quite-cheap macro. So if you are just putting back some excess text you
165	scanned, use `ss()'.
166
167	`REJECT' should be avoided at all costs when performance is
168	important. It is a particularly expensive option.
169
170	There is one case when `%option yylineno' can be expensive. That is
171	when your patterns match long tokens that could _possibly_ contain a
172	newline character. There is no performance penalty for rules that can
173	not possibly match newlines, since flex does not need to check them for
174	newlines. In general, you should avoid rules such as `[^f]+', which
175	match very long tokens, including newlines, and may possibly match your
176	entire file! A better approach is to separate `[^f]+' into two rules:
177
178
179	%option yylineno
180	%%
181	[^f\n]+
182	\n+
183
184	The above scanner does not incur a performance penalty.
185
186	Getting rid of backing up is messy and often may be an enormous
187	amount of work for a complicated scanner. In principal, one begins by
188	using the `-b' flag to generate a `lex.backup' file. For example, on
189	the input:
190
191
192	%%
193	foo return TOK_KEYWORD;
194	foobar return TOK_KEYWORD;
195
196	the file looks like:
197
198
199	State #6 is non-accepting -
200	associated rule line numbers:
201	2 3
202	out-transitions: [ o ]
203	jam-transitions: EOF [ \001-n p-\177 ]
204
205	State #8 is non-accepting -
206	associated rule line numbers:
207	3
208	out-transitions: [ a ]
209	jam-transitions: EOF [ \001-` b-\177 ]
210
211	State #9 is non-accepting -
212	associated rule line numbers:
213	3
214	out-transitions: [ r ]
215	jam-transitions: EOF [ \001-q s-\177 ]
216
217	Compressed tables always back up.
218
219	The first few lines tell us that there's a scanner state in which it
220	can make a transition on an 'o' but not on any other character, and
221	that in that state the currently scanned text does not match any rule.
222	The state occurs when trying to match the rules found at lines 2 and 3
223	in the input file. If the scanner is in that state and then reads
224	something other than an 'o', it will have to back up to find a rule
225	which is matched. With a bit of headscratching one can see that this
226	must be the state it's in when it has seen `fo'. When this has
227	happened, if anything other than another `o' is seen, the scanner will
228	have to back up to simply match the `f' (by the default rule).
229
230	The comment regarding State #8 indicates there's a problem when
231	`foob' has been scanned. Indeed, on any character other than an `a',
232	the scanner will have to back up to accept "foo". Similarly, the
233	comment for State #9 concerns when `fooba' has been scanned and an `r'
234	does not follow.
235
236	The final comment reminds us that there's no point going to all the
237	trouble of removing backing up from the rules unless we're using `-Cf'
238	or `-CF', since there's no performance gain doing so with compressed
239	scanners.
240
241	The way to remove the backing up is to add "error" rules:
242
243
244	%%
245	foo return TOK_KEYWORD;
246	foobar return TOK_KEYWORD;
247
248	fooba \|
249	foob \|
250	fo {
251	/* false alarm, not really a keyword */
252	return TOK_ID;
253	}
254
255	Eliminating backing up among a list of keywords can also be done
256	using a "catch-all" rule:
257
258
259	%%
260	foo return TOK_KEYWORD;
261	foobar return TOK_KEYWORD;
262
263	[a-z]+ return TOK_ID;
264
265	This is usually the best solution when appropriate.
266
267	Backing up messages tend to cascade. With a complicated set of rules
268	it's not uncommon to get hundreds of messages. If one can decipher
269	them, though, it often only takes a dozen or so rules to eliminate the
270	backing up (though it's easy to make a mistake and have an error rule
271	accidentally match a valid token. A possible future `flex' feature
272	will be to automatically add rules to eliminate backing up).
273
274	It's important to keep in mind that you gain the benefits of
275	eliminating backing up only if you eliminate _every_ instance of
276	backing up. Leaving just one means you gain nothing.
277
278	_Variable_ trailing context (where both the leading and trailing
279	parts do not have a fixed length) entails almost the same performance
280	loss as `REJECT' (i.e., substantial). So when possible a rule like:
281
282
283	%%
284	mouse\|rat/(cat\|dog) run();
285
286	is better written:
287
288
289	%%
290	mouse/cat\|dog run();
291	rat/cat\|dog run();
292
293	or as
294
295
296	%%
297	mouse\|rat/cat run();
298	mouse\|rat/dog run();
299
300	Note that here the special '\|' action does _not_ provide any
301	savings, and can even make things worse (*note Limitations::).
302
303	Another area where the user can increase a scanner's performance (and
304	one that's easier to implement) arises from the fact that the longer the