source: trunk/flex/doc/flex.info-3@ 3032

Last change on this file since 3032 was 3031, checked in by bird, 19 years ago

flex 2.5.33.

File size: 49.5 KB
Line 
1This is flex.info, produced by makeinfo version 4.5 from flex.texi.
2
3INFO-DIR-SECTION Programming
4START-INFO-DIR-ENTRY
5* flex: (flex). Fast lexical analyzer generator (lex replacement).
6END-INFO-DIR-ENTRY
7
8
9 The flex manual is placed under the same licensing conditions as the
10rest of flex:
11
12 Copyright (C) 1990, 1997 The Regents of the University of California.
13All rights reserved.
14
15 This code is derived from software contributed to Berkeley by Vern
16Paxson.
17
18 The United States Government has rights in this work pursuant to
19contract no. DE-AC03-76SF00098 between the United States Department of
20Energy and the University of California.
21
22 Redistribution and use in source and binary forms, with or without
23modification, are permitted provided that the following conditions are
24met:
25
26 1. Redistributions of source code must retain the above copyright
27 notice, this list of conditions and the following disclaimer.
28
29 2. Redistributions in binary form must reproduce the above copyright
30 notice, this list of conditions and the following disclaimer in the
31 documentation and/or other materials provided with the
32 distribution.
33 Neither the name of the University nor the names of its contributors
34may be used to endorse or promote products derived from this software
35without specific prior written permission.
36
37 THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
38WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
39MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
40
41File: flex.info, Node: Debugging Options, Next: Miscellaneous Options, Prev: Options for Scanner Speed and Size, Up: Scanner Options
42
43Debugging Options
44=================
45
46`-b, --backup, `%option backup''
47 Generate backing-up information to `lex.backup'. This is a list of
48 scanner states which require backing up and the input characters on
49 which they do so. By adding rules one can remove backing-up
50 states. If _all_ backing-up states are eliminated and `-Cf' or
51 `-CF' is used, the generated scanner will run faster (see the
52 `--perf-report' flag). Only users who wish to squeeze every last
53 cycle out of their scanners need worry about this option. (*note
54 Performance::).
55
56`-d, --debug, `%option debug''
57 makes the generated scanner run in "debug" mode. Whenever a
58 pattern is recognized and the global variable `yy_flex_debug' is
59 non-zero (which is the default), the scanner will write to
60 `stderr' a line of the form:
61
62
63 -accepting rule at line 53 ("the matched text")
64
65 The line number refers to the location of the rule in the file
66 defining the scanner (i.e., the file that was fed to flex).
67 Messages are also generated when the scanner backs up, accepts the
68 default rule, reaches the end of its input buffer (or encounters a
69 NUL; at this point, the two look the same as far as the scanner's
70 concerned), or reaches an end-of-file.
71
72`-p, --perf-report, `%option perf-report''
73 generates a performance report to `stderr'. The report consists of
74 comments regarding features of the `flex' input file which will
75 cause a serious loss of performance in the resulting scanner. If
76 you give the flag twice, you will also get comments regarding
77 features that lead to minor performance losses.
78
79 Note that the use of `REJECT', and variable trailing context
80 (*note Limitations::) entails a substantial performance penalty;
81 use of `yymore()', the `^' operator, and the `--interactive' flag
82 entail minor performance penalties.
83
84`-s, --nodefault, `%option nodefault''
85 causes the _default rule_ (that unmatched scanner input is echoed
86 to `stdout)' to be suppressed. If the scanner encounters input
87 that does not match any of its rules, it aborts with an error.
88 This option is useful for finding holes in a scanner's rule set.
89
90`-T, --trace, `%option trace''
91 makes `flex' run in "trace" mode. It will generate a lot of
92 messages to `stderr' concerning the form of the input and the
93 resultant non-deterministic and deterministic finite automata.
94 This option is mostly for use in maintaining `flex'.
95
96`-w, --nowarn, `%option nowarn''
97 suppresses warning messages.
98
99`-v, --verbose, `%option verbose''
100 specifies that `flex' should write to `stderr' a summary of
101 statistics regarding the scanner it generates. Most of the
102 statistics are meaningless to the casual `flex' user, but the
103 first line identifies the version of `flex' (same as reported by
104 `--version'), and the next line the flags used when generating the
105 scanner, including those that are on by default.
106
107`--warn, `%option warn''
108 warn about certain things. In particular, if the default rule can
109 be matched but no defualt rule has been given, the flex will warn
110 you. We recommend using this option always.
111
112
113
114File: flex.info, Node: Miscellaneous Options, Prev: Debugging Options, Up: Scanner Options
115
116Miscellaneous Options
117=====================
118
119`-c'
120 is a do-nothing option included for POSIX compliance.
121
122 generates
123
124`-h, -?, --help'
125 generates a "help" summary of `flex''s options to `stdout' and
126 then exits.
127
128`-n'
129 is another do-nothing option included only for POSIX compliance.
130
131`-V, --version'
132 prints the version number to `stdout' and exits.
133
134
135
136File: flex.info, Node: Performance, Next: Cxx, Prev: Scanner Options, Up: Top
137
138Performance Considerations
139**************************
140
141 The main design goal of `flex' is that it generate high-performance
142scanners. It has been optimized for dealing well with large sets of
143rules. Aside from the effects on scanner speed of the table compression
144`-C' options outlined above, there are a number of options/actions
145which degrade performance. These are, from most expensive to least:
146
147
148 REJECT
149 arbitrary trailing context
150
151 pattern sets that require backing up
152 %option yylineno
153 %array
154
155 %option interactive
156 %option always-interactive
157
158 @samp{^} beginning-of-line operator
159 yymore()
160
161 with the first two all being quite expensive and the last two being
162quite cheap. Note also that `unput()' is implemented as a routine call
163that potentially does quite a bit of work, while `yyless()' is a
164quite-cheap macro. So if you are just putting back some excess text you
165scanned, use `ss()'.
166
167 `REJECT' should be avoided at all costs when performance is
168important. It is a particularly expensive option.
169
170 There is one case when `%option yylineno' can be expensive. That is
171when your patterns match long tokens that could _possibly_ contain a
172newline character. There is no performance penalty for rules that can
173not possibly match newlines, since flex does not need to check them for
174newlines. In general, you should avoid rules such as `[^f]+', which
175match very long tokens, including newlines, and may possibly match your
176entire file! A better approach is to separate `[^f]+' into two rules:
177
178
179 %option yylineno
180 %%
181 [^f\n]+
182 \n+
183
184 The above scanner does not incur a performance penalty.
185
186 Getting rid of backing up is messy and often may be an enormous
187amount of work for a complicated scanner. In principal, one begins by
188using the `-b' flag to generate a `lex.backup' file. For example, on
189the input:
190
191
192 %%
193 foo return TOK_KEYWORD;
194 foobar return TOK_KEYWORD;
195
196 the file looks like:
197
198
199 State #6 is non-accepting -
200 associated rule line numbers:
201 2 3
202 out-transitions: [ o ]
203 jam-transitions: EOF [ \001-n p-\177 ]
204
205 State #8 is non-accepting -
206 associated rule line numbers:
207 3
208 out-transitions: [ a ]
209 jam-transitions: EOF [ \001-` b-\177 ]
210
211 State #9 is non-accepting -
212 associated rule line numbers:
213 3
214 out-transitions: [ r ]
215 jam-transitions: EOF [ \001-q s-\177 ]
216
217 Compressed tables always back up.
218
219 The first few lines tell us that there's a scanner state in which it
220can make a transition on an 'o' but not on any other character, and
221that in that state the currently scanned text does not match any rule.
222The state occurs when trying to match the rules found at lines 2 and 3
223in the input file. If the scanner is in that state and then reads
224something other than an 'o', it will have to back up to find a rule
225which is matched. With a bit of headscratching one can see that this
226must be the state it's in when it has seen `fo'. When this has
227happened, if anything other than another `o' is seen, the scanner will
228have to back up to simply match the `f' (by the default rule).
229
230 The comment regarding State #8 indicates there's a problem when
231`foob' has been scanned. Indeed, on any character other than an `a',
232the scanner will have to back up to accept "foo". Similarly, the
233comment for State #9 concerns when `fooba' has been scanned and an `r'
234does not follow.
235
236 The final comment reminds us that there's no point going to all the
237trouble of removing backing up from the rules unless we're using `-Cf'
238or `-CF', since there's no performance gain doing so with compressed
239scanners.
240
241 The way to remove the backing up is to add "error" rules:
242
243
244 %%
245 foo return TOK_KEYWORD;
246 foobar return TOK_KEYWORD;
247
248 fooba |
249 foob |
250 fo {
251 /* false alarm, not really a keyword */
252 return TOK_ID;
253 }
254
255 Eliminating backing up among a list of keywords can also be done
256using a "catch-all" rule:
257
258
259 %%
260 foo return TOK_KEYWORD;
261 foobar return TOK_KEYWORD;
262
263 [a-z]+ return TOK_ID;
264
265 This is usually the best solution when appropriate.
266
267 Backing up messages tend to cascade. With a complicated set of rules
268it's not uncommon to get hundreds of messages. If one can decipher
269them, though, it often only takes a dozen or so rules to eliminate the
270backing up (though it's easy to make a mistake and have an error rule
271accidentally match a valid token. A possible future `flex' feature
272will be to automatically add rules to eliminate backing up).
273
274 It's important to keep in mind that you gain the benefits of
275eliminating backing up only if you eliminate _every_ instance of
276backing up. Leaving just one means you gain nothing.
277
278 _Variable_ trailing context (where both the leading and trailing
279parts do not have a fixed length) entails almost the same performance
280loss as `REJECT' (i.e., substantial). So when possible a rule like:
281
282
283 %%
284 mouse|rat/(cat|dog) run();
285
286 is better written:
287
288
289 %%
290 mouse/cat|dog run();
291 rat/cat|dog run();
292
293 or as
294
295
296 %%
297 mouse|rat/cat run();
298 mouse|rat/dog run();
299
300 Note that here the special '|' action does _not_ provide any
301savings, and can even make things worse (*note Limitations::).
302
303 Another area where the user can increase a scanner's performance (and
304one that's easier to implement) arises from the fact that the longer the