| 1 | This is flex.info, produced by makeinfo version 4.5 from flex.texi.
|
|---|
| 2 |
|
|---|
| 3 | INFO-DIR-SECTION Programming
|
|---|
| 4 | START-INFO-DIR-ENTRY
|
|---|
| 5 | * flex: (flex). Fast lexical analyzer generator (lex replacement).
|
|---|
| 6 | END-INFO-DIR-ENTRY
|
|---|
| 7 |
|
|---|
| 8 |
|
|---|
| 9 | The flex manual is placed under the same licensing conditions as the
|
|---|
| 10 | rest of flex:
|
|---|
| 11 |
|
|---|
| 12 | Copyright (C) 1990, 1997 The Regents of the University of California.
|
|---|
| 13 | All rights reserved.
|
|---|
| 14 |
|
|---|
| 15 | This code is derived from software contributed to Berkeley by Vern
|
|---|
| 16 | Paxson.
|
|---|
| 17 |
|
|---|
| 18 | The United States Government has rights in this work pursuant to
|
|---|
| 19 | contract no. DE-AC03-76SF00098 between the United States Department of
|
|---|
| 20 | Energy and the University of California.
|
|---|
| 21 |
|
|---|
| 22 | Redistribution and use in source and binary forms, with or without
|
|---|
| 23 | modification, are permitted provided that the following conditions are
|
|---|
| 24 | met:
|
|---|
| 25 |
|
|---|
| 26 | 1. Redistributions of source code must retain the above copyright
|
|---|
| 27 | notice, this list of conditions and the following disclaimer.
|
|---|
| 28 |
|
|---|
| 29 | 2. Redistributions in binary form must reproduce the above copyright
|
|---|
| 30 | notice, this list of conditions and the following disclaimer in the
|
|---|
| 31 | documentation and/or other materials provided with the
|
|---|
| 32 | distribution.
|
|---|
| 33 | Neither the name of the University nor the names of its contributors
|
|---|
| 34 | may be used to endorse or promote products derived from this software
|
|---|
| 35 | without specific prior written permission.
|
|---|
| 36 |
|
|---|
| 37 | THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
|
|---|
| 38 | WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
|
|---|
| 39 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
|
|---|
| 40 |
|
|---|
| 41 | File: flex.info, Node: Debugging Options, Next: Miscellaneous Options, Prev: Options for Scanner Speed and Size, Up: Scanner Options
|
|---|
| 42 |
|
|---|
| 43 | Debugging Options
|
|---|
| 44 | =================
|
|---|
| 45 |
|
|---|
| 46 | `-b, --backup, `%option backup''
|
|---|
| 47 | Generate backing-up information to `lex.backup'. This is a list of
|
|---|
| 48 | scanner states which require backing up and the input characters on
|
|---|
| 49 | which they do so. By adding rules one can remove backing-up
|
|---|
| 50 | states. If _all_ backing-up states are eliminated and `-Cf' or
|
|---|
| 51 | `-CF' is used, the generated scanner will run faster (see the
|
|---|
| 52 | `--perf-report' flag). Only users who wish to squeeze every last
|
|---|
| 53 | cycle out of their scanners need worry about this option. (*note
|
|---|
| 54 | Performance::).
|
|---|
| 55 |
|
|---|
| 56 | `-d, --debug, `%option debug''
|
|---|
| 57 | makes the generated scanner run in "debug" mode. Whenever a
|
|---|
| 58 | pattern is recognized and the global variable `yy_flex_debug' is
|
|---|
| 59 | non-zero (which is the default), the scanner will write to
|
|---|
| 60 | `stderr' a line of the form:
|
|---|
| 61 |
|
|---|
| 62 |
|
|---|
| 63 | -accepting rule at line 53 ("the matched text")
|
|---|
| 64 |
|
|---|
| 65 | The line number refers to the location of the rule in the file
|
|---|
| 66 | defining the scanner (i.e., the file that was fed to flex).
|
|---|
| 67 | Messages are also generated when the scanner backs up, accepts the
|
|---|
| 68 | default rule, reaches the end of its input buffer (or encounters a
|
|---|
| 69 | NUL; at this point, the two look the same as far as the scanner's
|
|---|
| 70 | concerned), or reaches an end-of-file.
|
|---|
| 71 |
|
|---|
| 72 | `-p, --perf-report, `%option perf-report''
|
|---|
| 73 | generates a performance report to `stderr'. The report consists of
|
|---|
| 74 | comments regarding features of the `flex' input file which will
|
|---|
| 75 | cause a serious loss of performance in the resulting scanner. If
|
|---|
| 76 | you give the flag twice, you will also get comments regarding
|
|---|
| 77 | features that lead to minor performance losses.
|
|---|
| 78 |
|
|---|
| 79 | Note that the use of `REJECT', and variable trailing context
|
|---|
| 80 | (*note Limitations::) entails a substantial performance penalty;
|
|---|
| 81 | use of `yymore()', the `^' operator, and the `--interactive' flag
|
|---|
| 82 | entail minor performance penalties.
|
|---|
| 83 |
|
|---|
| 84 | `-s, --nodefault, `%option nodefault''
|
|---|
| 85 | causes the _default rule_ (that unmatched scanner input is echoed
|
|---|
| 86 | to `stdout)' to be suppressed. If the scanner encounters input
|
|---|
| 87 | that does not match any of its rules, it aborts with an error.
|
|---|
| 88 | This option is useful for finding holes in a scanner's rule set.
|
|---|
| 89 |
|
|---|
| 90 | `-T, --trace, `%option trace''
|
|---|
| 91 | makes `flex' run in "trace" mode. It will generate a lot of
|
|---|
| 92 | messages to `stderr' concerning the form of the input and the
|
|---|
| 93 | resultant non-deterministic and deterministic finite automata.
|
|---|
| 94 | This option is mostly for use in maintaining `flex'.
|
|---|
| 95 |
|
|---|
| 96 | `-w, --nowarn, `%option nowarn''
|
|---|
| 97 | suppresses warning messages.
|
|---|
| 98 |
|
|---|
| 99 | `-v, --verbose, `%option verbose''
|
|---|
| 100 | specifies that `flex' should write to `stderr' a summary of
|
|---|
| 101 | statistics regarding the scanner it generates. Most of the
|
|---|
| 102 | statistics are meaningless to the casual `flex' user, but the
|
|---|
| 103 | first line identifies the version of `flex' (same as reported by
|
|---|
| 104 | `--version'), and the next line the flags used when generating the
|
|---|
| 105 | scanner, including those that are on by default.
|
|---|
| 106 |
|
|---|
| 107 | `--warn, `%option warn''
|
|---|
| 108 | warn about certain things. In particular, if the default rule can
|
|---|
| 109 | be matched but no defualt rule has been given, the flex will warn
|
|---|
| 110 | you. We recommend using this option always.
|
|---|
| 111 |
|
|---|
| 112 |
|
|---|
| 113 |
|
|---|
| 114 | File: flex.info, Node: Miscellaneous Options, Prev: Debugging Options, Up: Scanner Options
|
|---|
| 115 |
|
|---|
| 116 | Miscellaneous Options
|
|---|
| 117 | =====================
|
|---|
| 118 |
|
|---|
| 119 | `-c'
|
|---|
| 120 | is a do-nothing option included for POSIX compliance.
|
|---|
| 121 |
|
|---|
| 122 | generates
|
|---|
| 123 |
|
|---|
| 124 | `-h, -?, --help'
|
|---|
| 125 | generates a "help" summary of `flex''s options to `stdout' and
|
|---|
| 126 | then exits.
|
|---|
| 127 |
|
|---|
| 128 | `-n'
|
|---|
| 129 | is another do-nothing option included only for POSIX compliance.
|
|---|
| 130 |
|
|---|
| 131 | `-V, --version'
|
|---|
| 132 | prints the version number to `stdout' and exits.
|
|---|
| 133 |
|
|---|
| 134 |
|
|---|
| 135 |
|
|---|
| 136 | File: flex.info, Node: Performance, Next: Cxx, Prev: Scanner Options, Up: Top
|
|---|
| 137 |
|
|---|
| 138 | Performance Considerations
|
|---|
| 139 | **************************
|
|---|
| 140 |
|
|---|
| 141 | The main design goal of `flex' is that it generate high-performance
|
|---|
| 142 | scanners. It has been optimized for dealing well with large sets of
|
|---|
| 143 | rules. Aside from the effects on scanner speed of the table compression
|
|---|
| 144 | `-C' options outlined above, there are a number of options/actions
|
|---|
| 145 | which degrade performance. These are, from most expensive to least:
|
|---|
| 146 |
|
|---|
| 147 |
|
|---|
| 148 | REJECT
|
|---|
| 149 | arbitrary trailing context
|
|---|
| 150 |
|
|---|
| 151 | pattern sets that require backing up
|
|---|
| 152 | %option yylineno
|
|---|
| 153 | %array
|
|---|
| 154 |
|
|---|
| 155 | %option interactive
|
|---|
| 156 | %option always-interactive
|
|---|
| 157 |
|
|---|
| 158 | @samp{^} beginning-of-line operator
|
|---|
| 159 | yymore()
|
|---|
| 160 |
|
|---|
| 161 | with the first two all being quite expensive and the last two being
|
|---|
| 162 | quite cheap. Note also that `unput()' is implemented as a routine call
|
|---|
| 163 | that potentially does quite a bit of work, while `yyless()' is a
|
|---|
| 164 | quite-cheap macro. So if you are just putting back some excess text you
|
|---|
| 165 | scanned, use `ss()'.
|
|---|
| 166 |
|
|---|
| 167 | `REJECT' should be avoided at all costs when performance is
|
|---|
| 168 | important. It is a particularly expensive option.
|
|---|
| 169 |
|
|---|
| 170 | There is one case when `%option yylineno' can be expensive. That is
|
|---|
| 171 | when your patterns match long tokens that could _possibly_ contain a
|
|---|
| 172 | newline character. There is no performance penalty for rules that can
|
|---|
| 173 | not possibly match newlines, since flex does not need to check them for
|
|---|
| 174 | newlines. In general, you should avoid rules such as `[^f]+', which
|
|---|
| 175 | match very long tokens, including newlines, and may possibly match your
|
|---|
| 176 | entire file! A better approach is to separate `[^f]+' into two rules:
|
|---|
| 177 |
|
|---|
| 178 |
|
|---|
| 179 | %option yylineno
|
|---|
| 180 | %%
|
|---|
| 181 | [^f\n]+
|
|---|
| 182 | \n+
|
|---|
| 183 |
|
|---|
| 184 | The above scanner does not incur a performance penalty.
|
|---|
| 185 |
|
|---|
| 186 | Getting rid of backing up is messy and often may be an enormous
|
|---|
| 187 | amount of work for a complicated scanner. In principal, one begins by
|
|---|
| 188 | using the `-b' flag to generate a `lex.backup' file. For example, on
|
|---|
| 189 | the input:
|
|---|
| 190 |
|
|---|
| 191 |
|
|---|
| 192 | %%
|
|---|
| 193 | foo return TOK_KEYWORD;
|
|---|
| 194 | foobar return TOK_KEYWORD;
|
|---|
| 195 |
|
|---|
| 196 | the file looks like:
|
|---|
| 197 |
|
|---|
| 198 |
|
|---|
| 199 | State #6 is non-accepting -
|
|---|
| 200 | associated rule line numbers:
|
|---|
| 201 | 2 3
|
|---|
| 202 | out-transitions: [ o ]
|
|---|
| 203 | jam-transitions: EOF [ \001-n p-\177 ]
|
|---|
| 204 |
|
|---|
| 205 | State #8 is non-accepting -
|
|---|
| 206 | associated rule line numbers:
|
|---|
| 207 | 3
|
|---|
| 208 | out-transitions: [ a ]
|
|---|
| 209 | jam-transitions: EOF [ \001-` b-\177 ]
|
|---|
| 210 |
|
|---|
| 211 | State #9 is non-accepting -
|
|---|
| 212 | associated rule line numbers:
|
|---|
| 213 | 3
|
|---|
| 214 | out-transitions: [ r ]
|
|---|
| 215 | jam-transitions: EOF [ \001-q s-\177 ]
|
|---|
| 216 |
|
|---|
| 217 | Compressed tables always back up.
|
|---|
| 218 |
|
|---|
| 219 | The first few lines tell us that there's a scanner state in which it
|
|---|
| 220 | can make a transition on an 'o' but not on any other character, and
|
|---|
| 221 | that in that state the currently scanned text does not match any rule.
|
|---|
| 222 | The state occurs when trying to match the rules found at lines 2 and 3
|
|---|
| 223 | in the input file. If the scanner is in that state and then reads
|
|---|
| 224 | something other than an 'o', it will have to back up to find a rule
|
|---|
| 225 | which is matched. With a bit of headscratching one can see that this
|
|---|
| 226 | must be the state it's in when it has seen `fo'. When this has
|
|---|
| 227 | happened, if anything other than another `o' is seen, the scanner will
|
|---|
| 228 | have to back up to simply match the `f' (by the default rule).
|
|---|
| 229 |
|
|---|
| 230 | The comment regarding State #8 indicates there's a problem when
|
|---|
| 231 | `foob' has been scanned. Indeed, on any character other than an `a',
|
|---|
| 232 | the scanner will have to back up to accept "foo". Similarly, the
|
|---|
| 233 | comment for State #9 concerns when `fooba' has been scanned and an `r'
|
|---|
| 234 | does not follow.
|
|---|
| 235 |
|
|---|
| 236 | The final comment reminds us that there's no point going to all the
|
|---|
| 237 | trouble of removing backing up from the rules unless we're using `-Cf'
|
|---|
| 238 | or `-CF', since there's no performance gain doing so with compressed
|
|---|
| 239 | scanners.
|
|---|
| 240 |
|
|---|
| 241 | The way to remove the backing up is to add "error" rules:
|
|---|
| 242 |
|
|---|
| 243 |
|
|---|
| 244 | %%
|
|---|
| 245 | foo return TOK_KEYWORD;
|
|---|
| 246 | foobar return TOK_KEYWORD;
|
|---|
| 247 |
|
|---|
| 248 | fooba |
|
|---|
| 249 | foob |
|
|---|
| 250 | fo {
|
|---|
| 251 | /* false alarm, not really a keyword */
|
|---|
| 252 | return TOK_ID;
|
|---|
| 253 | }
|
|---|
| 254 |
|
|---|
| 255 | Eliminating backing up among a list of keywords can also be done
|
|---|
| 256 | using a "catch-all" rule:
|
|---|
| 257 |
|
|---|
| 258 |
|
|---|
| 259 | %%
|
|---|
| 260 | foo return TOK_KEYWORD;
|
|---|
| 261 | foobar return TOK_KEYWORD;
|
|---|
| 262 |
|
|---|
| 263 | [a-z]+ return TOK_ID;
|
|---|
| 264 |
|
|---|
| 265 | This is usually the best solution when appropriate.
|
|---|
| 266 |
|
|---|
| 267 | Backing up messages tend to cascade. With a complicated set of rules
|
|---|
| 268 | it's not uncommon to get hundreds of messages. If one can decipher
|
|---|
| 269 | them, though, it often only takes a dozen or so rules to eliminate the
|
|---|
| 270 | backing up (though it's easy to make a mistake and have an error rule
|
|---|
| 271 | accidentally match a valid token. A possible future `flex' feature
|
|---|
| 272 | will be to automatically add rules to eliminate backing up).
|
|---|
| 273 |
|
|---|
| 274 | It's important to keep in mind that you gain the benefits of
|
|---|
| 275 | eliminating backing up only if you eliminate _every_ instance of
|
|---|
| 276 | backing up. Leaving just one means you gain nothing.
|
|---|
| 277 |
|
|---|
| 278 | _Variable_ trailing context (where both the leading and trailing
|
|---|
| 279 | parts do not have a fixed length) entails almost the same performance
|
|---|
| 280 | loss as `REJECT' (i.e., substantial). So when possible a rule like:
|
|---|
| 281 |
|
|---|
| 282 |
|
|---|
| 283 | %%
|
|---|
| 284 | mouse|rat/(cat|dog) run();
|
|---|
| 285 |
|
|---|
| 286 | is better written:
|
|---|
| 287 |
|
|---|
| 288 |
|
|---|
| 289 | %%
|
|---|
| 290 | mouse/cat|dog run();
|
|---|
| 291 | rat/cat|dog run();
|
|---|
| 292 |
|
|---|
| 293 | or as
|
|---|
| 294 |
|
|---|
| 295 |
|
|---|
| 296 | %%
|
|---|
| 297 | mouse|rat/cat run();
|
|---|
| 298 | mouse|rat/dog run();
|
|---|
| 299 |
|
|---|
| 300 | Note that here the special '|' action does _not_ provide any
|
|---|
| 301 | savings, and can even make things worse (*note Limitations::).
|
|---|
| 302 |
|
|---|
| 303 | Another area where the user can increase a scanner's performance (and
|
|---|
| 304 | one that's easier to implement) arises from the fact that the longer the
|
|---|
|
|---|