| 1 | This is flex.info, produced by makeinfo version 4.5 from flex.texi.
|
|---|
| 2 |
|
|---|
| 3 | INFO-DIR-SECTION Programming
|
|---|
| 4 | START-INFO-DIR-ENTRY
|
|---|
| 5 | * flex: (flex). Fast lexical analyzer generator (lex replacement).
|
|---|
| 6 | END-INFO-DIR-ENTRY
|
|---|
| 7 |
|
|---|
| 8 |
|
|---|
| 9 | The flex manual is placed under the same licensing conditions as the
|
|---|
| 10 | rest of flex:
|
|---|
| 11 |
|
|---|
| 12 | Copyright (C) 1990, 1997 The Regents of the University of California.
|
|---|
| 13 | All rights reserved.
|
|---|
| 14 |
|
|---|
| 15 | This code is derived from software contributed to Berkeley by Vern
|
|---|
| 16 | Paxson.
|
|---|
| 17 |
|
|---|
| 18 | The United States Government has rights in this work pursuant to
|
|---|
| 19 | contract no. DE-AC03-76SF00098 between the United States Department of
|
|---|
| 20 | Energy and the University of California.
|
|---|
| 21 |
|
|---|
| 22 | Redistribution and use in source and binary forms, with or without
|
|---|
| 23 | modification, are permitted provided that the following conditions are
|
|---|
| 24 | met:
|
|---|
| 25 |
|
|---|
| 26 | 1. Redistributions of source code must retain the above copyright
|
|---|
| 27 | notice, this list of conditions and the following disclaimer.
|
|---|
| 28 |
|
|---|
| 29 | 2. Redistributions in binary form must reproduce the above copyright
|
|---|
| 30 | notice, this list of conditions and the following disclaimer in the
|
|---|
| 31 | documentation and/or other materials provided with the
|
|---|
| 32 | distribution.
|
|---|
| 33 | Neither the name of the University nor the names of its contributors
|
|---|
| 34 | may be used to endorse or promote products derived from this software
|
|---|
| 35 | without specific prior written permission.
|
|---|
| 36 |
|
|---|
| 37 | THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
|
|---|
| 38 | WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
|
|---|
| 39 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
|
|---|
| 40 |
|
|---|
| 41 | File: flex.info, Node: How do I match any string not matched in the preceding rules?, Next: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Prev: How can I build a two-pass scanner?, Up: FAQ
|
|---|
| 42 |
|
|---|
| 43 | How do I match any string not matched in the preceding rules?
|
|---|
| 44 | =============================================================
|
|---|
| 45 |
|
|---|
| 46 | One way to assign precedence, is to place the more specific rules
|
|---|
| 47 | first. If two rules would match the same input (same sequence of
|
|---|
| 48 | characters) then the first rule listed in the `flex' input wins. e.g.,
|
|---|
| 49 |
|
|---|
| 50 |
|
|---|
| 51 | %%
|
|---|
| 52 | foo[a-zA-Z_]+ return FOO_ID;
|
|---|
| 53 | bar[a-zA-Z_]+ return BAR_ID;
|
|---|
| 54 | [a-zA-Z_]+ return GENERIC_ID;
|
|---|
| 55 |
|
|---|
| 56 | Note that the rule `[a-zA-Z_]+' must come *after* the others. It
|
|---|
| 57 | will match the same amount of text as the more specific rules, and in
|
|---|
| 58 | that case the `flex' scanner will pick the first rule listed in your
|
|---|
| 59 | scanner as the one to match.
|
|---|
| 60 |
|
|---|
| 61 |
|
|---|
| 62 | File: flex.info, Node: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Next: Is there a way to make flex treat NULL like a regular character?, Prev: How do I match any string not matched in the preceding rules?, Up: FAQ
|
|---|
| 63 |
|
|---|
| 64 | I am trying to port code from AT&T lex that uses yysptr and yysbuf.
|
|---|
| 65 | ===================================================================
|
|---|
| 66 |
|
|---|
| 67 | Those are internal variables pointing into the AT&T scanner's input
|
|---|
| 68 | buffer. I imagine they're being manipulated in user versions of the
|
|---|
| 69 | `input()' and `unput()' functions. If so, what you need to do is
|
|---|
| 70 | analyze those functions to figure out what they're doing, and then
|
|---|
| 71 | replace `input()' with an appropriate definition of `YY_INPUT'. You
|
|---|
| 72 | shouldn't need to (and must not) replace `flex''s `unput()' function.
|
|---|
| 73 |
|
|---|
| 74 |
|
|---|
| 75 | File: flex.info, Node: Is there a way to make flex treat NULL like a regular character?, Next: Whenever flex can not match the input it says "flex scanner jammed"., Prev: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Up: FAQ
|
|---|
| 76 |
|
|---|
| 77 | Is there a way to make flex treat NULL like a regular character?
|
|---|
| 78 | ================================================================
|
|---|
| 79 |
|
|---|
| 80 | Yes, `\0' and `\x00' should both do the trick. Perhaps you have an
|
|---|
| 81 | ancient version of `flex'. The latest release is version 2.5.33.
|
|---|
| 82 |
|
|---|
| 83 |
|
|---|
| 84 | File: flex.info, Node: Whenever flex can not match the input it says "flex scanner jammed"., Next: Why doesnt flex have non-greedy operators like perl does?, Prev: Is there a way to make flex treat NULL like a regular character?, Up: FAQ
|
|---|
| 85 |
|
|---|
| 86 | Whenever flex can not match the input it says "flex scanner jammed".
|
|---|
| 87 | ====================================================================
|
|---|
| 88 |
|
|---|
| 89 | You need to add a rule that matches the otherwise-unmatched text.
|
|---|
| 90 | e.g.,
|
|---|
| 91 |
|
|---|
| 92 |
|
|---|
| 93 | %option yylineno
|
|---|
| 94 | %%
|
|---|
| 95 | [[a bunch of rules here]]
|
|---|
| 96 |
|
|---|
| 97 | . printf("bad input character '%s' at line %d\n", yytext, yylineno);
|
|---|
| 98 |
|
|---|
| 99 | See `%option default' for more information.
|
|---|
| 100 |
|
|---|
| 101 |
|
|---|
| 102 | File: flex.info, Node: Why doesnt flex have non-greedy operators like perl does?, Next: Memory leak - 16386 bytes allocated by malloc., Prev: Whenever flex can not match the input it says "flex scanner jammed"., Up: FAQ
|
|---|
| 103 |
|
|---|
| 104 | Why doesn't flex have non-greedy operators like perl does?
|
|---|
| 105 | ==========================================================
|
|---|
| 106 |
|
|---|
| 107 | A DFA can do a non-greedy match by stopping the first time it enters
|
|---|
| 108 | an accepting state, instead of consuming input until it determines that
|
|---|
| 109 | no further matching is possible (a "jam" state). This is actually
|
|---|
| 110 | easier to implement than longest leftmost match (which flex does).
|
|---|
| 111 |
|
|---|
| 112 | But it's also much less useful than longest leftmost match. In
|
|---|
| 113 | general, when you find yourself wishing for non-greedy matching, that's
|
|---|
| 114 | usually a sign that you're trying to make the scanner do some parsing.
|
|---|
| 115 | That's generally the wrong approach, since it lacks the power to do a
|
|---|
| 116 | decent job. Better is to either introduce a separate parser, or to
|
|---|
| 117 | split the scanner into multiple scanners using (exclusive) start
|
|---|
| 118 | conditions.
|
|---|
| 119 |
|
|---|
| 120 | You might have a separate start state once you've seen the `BEGIN'.
|
|---|
| 121 | In that state, you might then have a regex that will match `END' (to
|
|---|
| 122 | kick you out of the state), and perhaps `(.|\n)' to get a single
|
|---|
| 123 | character within the chunk ...
|
|---|
| 124 |
|
|---|
| 125 | This approach also has much better error-reporting properties.
|
|---|
| 126 |
|
|---|
| 127 |
|
|---|
| 128 | File: flex.info, Node: Memory leak - 16386 bytes allocated by malloc., Next: How do I track the byte offset for lseek()?, Prev: Why doesnt flex have non-greedy operators like perl does?, Up: FAQ
|
|---|
| 129 |
|
|---|
| 130 | Memory leak - 16386 bytes allocated by malloc.
|
|---|
| 131 | ==============================================
|
|---|
| 132 |
|
|---|
| 133 | UPDATED 2002-07-10: As of `flex' version 2.5.9, this leak means that
|
|---|
| 134 | you did not call `yylex_destroy()'. If you are using an earlier version
|
|---|
| 135 | of `flex', then read on.
|
|---|
| 136 |
|
|---|
| 137 | The leak is about 16426 bytes. That is, (8192 * 2 + 2) for the
|
|---|
| 138 | read-buffer, and about 40 for `struct yy_buffer_state' (depending upon
|
|---|
| 139 | alignment). The leak is in the non-reentrant C scanner only (NOT in the
|
|---|
| 140 | reentrant scanner, NOT in the C++ scanner). Since `flex' doesn't know
|
|---|
| 141 | when you are done, the buffer is never freed.
|
|---|
| 142 |
|
|---|
| 143 | However, the leak won't multiply since the buffer is reused no
|
|---|
| 144 | matter how many times you call `yylex()'.
|
|---|
| 145 |
|
|---|
| 146 | If you want to reclaim the memory when you are completely done
|
|---|
| 147 | scanning, then you might try this:
|
|---|
| 148 |
|
|---|
| 149 |
|
|---|
| 150 | /* For non-reentrant C scanner only. */
|
|---|
| 151 | yy_delete_buffer(YY_CURRENT_BUFFER);
|
|---|
| 152 | yy_init = 1;
|
|---|
| 153 |
|
|---|
| 154 | Note: `yy_init' is an "internal variable", and hasn't been tested in
|
|---|
| 155 | this situation. It is possible that some other globals may need
|
|---|
| 156 | resetting as well.
|
|---|
| 157 |
|
|---|
| 158 |
|
|---|
| 159 | File: flex.info, Node: How do I track the byte offset for lseek()?, Next: How do I use my own I/O classes in a C++ scanner?, Prev: Memory leak - 16386 bytes allocated by malloc., Up: FAQ
|
|---|
| 160 |
|
|---|
| 161 | How do I track the byte offset for lseek()?
|
|---|
| 162 | ===========================================
|
|---|
| 163 |
|
|---|
| 164 |
|
|---|
| 165 | > We thought that it would be possible to have this number through the
|
|---|
| 166 | > evaluation of the following expression:
|
|---|
| 167 | >
|
|---|
| 168 | > seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - YY_CURRENT_BUFFER->yy_ch_buf
|
|---|
| 169 |
|
|---|
| 170 | While this is the right idea, it has two problems. The first is that
|
|---|
| 171 | it's possible that `flex' will request less than `YY_READ_BUF_SIZE'
|
|---|
| 172 | during an invocation of `YY_INPUT' (or that your input source will
|
|---|
| 173 | return less even though `YY_READ_BUF_SIZE' bytes were requested). The
|
|---|
| 174 | second problem is that when refilling its internal buffer, `flex' keeps
|
|---|
| 175 | some characters from the previous buffer (because usually it's in the
|
|---|
| 176 | middle of a match, and needs those characters to construct `yytext' for
|
|---|
| 177 | the match once it's done). Because of this, `yy_c_buf_p -
|
|---|
| 178 | YY_CURRENT_BUFFER->yy_ch_buf' won't be exactly the number of characters
|
|---|
| 179 | already read from the current buffer.
|
|---|
| 180 |
|
|---|
| 181 | An alternative solution is to count the number of characters you've
|
|---|
| 182 | matched since starting to scan. This can be done by using
|
|---|
| 183 | `YY_USER_ACTION'. For example,
|
|---|
| 184 |
|
|---|
| 185 |
|
|---|
| 186 | #define YY_USER_ACTION num_chars += yyleng;
|
|---|
| 187 |
|
|---|
| 188 | (You need to be careful to update your bookkeeping if you use
|
|---|
| 189 | `yymore('), `yyless()', `unput()', or `input()'.)
|
|---|
| 190 |
|
|---|
| 191 |
|
|---|
| 192 | File: flex.info, Node: How do I use my own I/O classes in a C++ scanner?, Next: How do I skip as many chars as possible?, Prev: How do I track the byte offset for lseek()?, Up: FAQ
|
|---|
| 193 |
|
|---|
| 194 | How do I use my own I/O classes in a C++ scanner?
|
|---|
| 195 | =================================================
|
|---|
| 196 |
|
|---|
| 197 | When the flex C++ scanning class rewrite finally happens, then this
|
|---|
| 198 | sort of thing should become much easier.
|
|---|
| 199 |
|
|---|
| 200 | You can do this by passing the various functions (such as
|
|---|
| 201 | `LexerInput()' and `LexerOutput()') NULL `iostream*''s, and then
|
|---|
| 202 | dealing with your own I/O classes surreptitiously (i.e., stashing them
|
|---|
| 203 | in special member variables). This works because the only assumption
|
|---|
| 204 | about the lexer regarding what's done with the iostream's is that
|
|---|
| 205 | they're ultimately passed to `LexerInput()' and `LexerOutput', which
|
|---|
| 206 | then do whatever is necessary with them.
|
|---|
| 207 |
|
|---|
| 208 |
|
|---|
| 209 | File: flex.info, Node: How do I skip as many chars as possible?, Next: deleteme00, Prev: How do I use my own I/O classes in a C++ scanner?, Up: FAQ
|
|---|
| 210 |
|
|---|
| 211 | How do I skip as many chars as possible?
|
|---|
| 212 | ========================================
|
|---|
| 213 |
|
|---|
| 214 | How do I skip as many chars as possible - without interfering with
|
|---|
| 215 | the other patterns?
|
|---|
| 216 |
|
|---|
| 217 | In the example below, we want to skip over characters until we see
|
|---|
| 218 | the phrase "endskip". The following will _NOT_ work correctly (do you
|
|---|
| 219 | see why not?)
|
|---|
| 220 |
|
|---|
| 221 |
|
|---|
| 222 | /* INCORRECT SCANNER */
|
|---|
| 223 | %x SKIP
|
|---|
| 224 | %%
|
|---|
| 225 | <INITIAL>startskip BEGIN(SKIP);
|
|---|
| 226 | ...
|
|---|
| 227 | <SKIP>"endskip" BEGIN(INITIAL);
|
|---|
| 228 | <SKIP>.* ;
|
|---|
| 229 |
|
|---|
| 230 | The problem is that the pattern .* will eat up the word "endskip."
|
|---|
| 231 | The simplest (but slow) fix is:
|
|---|
| 232 |
|
|---|
| 233 |
|
|---|
| 234 | <SKIP>"endskip" BEGIN(INITIAL);
|
|---|
| 235 | <SKIP>. ;
|
|---|
| 236 |
|
|---|
| 237 | The fix involves making the second rule match more, without making
|
|---|
| 238 | it match "endskip" plus something else. So for example:
|
|---|
| 239 |
|
|---|
| 240 |
|
|---|
| 241 | <SKIP>"endskip" BEGIN(INITIAL);
|
|---|
| 242 | <SKIP>[^e]+ ;
|
|---|
| 243 | <SKIP>. ;/* so you eat up e's, too */
|
|---|
| 244 |
|
|---|
| 245 |
|
|---|
| 246 | File: flex.info, Node: deleteme00, Next: Are certain equivalent patterns faster than others?, Prev: How do I skip as many chars as possible?, Up: FAQ
|
|---|
| 247 |
|
|---|
| 248 | deleteme00
|
|---|
| 249 | ==========
|
|---|
| 250 |
|
|---|
| 251 |
|
|---|
| 252 | QUESTION:
|
|---|
| 253 | When was flex born?
|
|---|
| 254 |
|
|---|
| 255 | Vern Paxson took over
|
|---|
| 256 | the Software Tools lex project from Jef Poskanzer in 1982. At that point it
|
|---|
| 257 | was written in Ratfor. Around 1987 or so, Paxson translated it into C, and
|
|---|
| 258 | a legend was born :-).
|
|---|
| 259 |
|
|---|
| 260 |
|
|---|
| 261 | File: flex.info, Node: Are certain equivalent patterns faster than others?, Next: Is backing up a big deal?, Prev: deleteme00, Up: FAQ
|
|---|
| 262 |
|
|---|
| 263 | Are certain equivalent patterns faster than others?
|
|---|
| 264 | ===================================================
|
|---|
| 265 |
|
|---|
| 266 |
|
|---|
| 267 | To: Adoram Rogel <[email protected]>
|
|---|
| 268 | Subject: Re: Flex 2.5.2 performance questions
|
|---|
| 269 | In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT.
|
|---|
| 270 | Date: Wed, 18 Sep 96 10:51:02 PDT
|
|---|
| 271 | From: Vern Paxson <vern>
|
|---|
| 272 |
|
|---|
| 273 | [Note, the most recent flex release is 2.5.4, which you can get from
|
|---|
| 274 | ftp.ee.lbl.gov. It has bug fixes over 2.5.2 and 2.5.3.]
|
|---|
| 275 |
|
|---|
| 276 | > 1. Using the pattern
|
|---|
| 277 | > ([Ff](oot)?)?[Nn](ote)?(\.)?
|
|---|
| 278 | > instead of
|
|---|
| 279 | > (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.)))
|
|---|
| 280 | > (in a very complicated flex program) caused the program to slow from
|
|---|
| 281 | > 300K+/min to 100K/min (no other changes were done).
|
|---|
| 282 |
|
|---|
| 283 | These two are not equivalent. For example, the first can match "footnote."
|
|---|
| 284 | but the second can only match "footnote". This is almost certainly the
|
|---|
| 285 | cause in the discrepancy - the slower scanner run is matching more tokens,
|
|---|
| 286 | and/or having to do more backing up.
|
|---|
| 287 |
|
|---|
| 288 | > 2. Which of these two are better: [Ff]oot or (F|f)oot ?
|
|---|
| 289 |
|
|---|
| 290 | From a performance point of view, they're equivalent (modulo presumably
|
|---|
| 291 | minor effects such as memory cache hit rates; and the presence of trailing
|
|---|
| 292 | context, see below). From a space point of view, the first is slightly
|
|---|
| 293 | preferable.
|
|---|
| 294 |
|
|---|
| 295 | > 3. I have a pattern that look like this:
|
|---|
| 296 | > pats {p1}|{p2}|{p3}|...|{p50} (50 patterns ORd)
|
|---|
| 297 | >
|
|---|
| 298 | > running yet another complicated program that includes the following rule:
|
|---|
| 299 | > <snext>{and}/{no4}{bb}{pats}
|
|---|
| 300 | >
|
|---|
| 301 | > gets me to "too complicated - over 32,000 states"...
|
|---|
| 302 |
|
|---|
| 303 | I can't tell from this example whether the trailing context is variable-length
|
|---|
| 304 | or fixed-length (it could be the latter if {and} is fixed-length). If it's
|
|---|
| 305 | variable length, which flex -p will tell you, then this reflects a basic
|
|---|
| 306 | performance problem, and if you can eliminate it by restructuring your
|
|---|
| 307 | scanner, you will see significant improvement.
|
|---|
| 308 |
|
|---|
| 309 | > so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about
|
|---|
| 310 | > 10 patterns and changed the rule to be 5 rules.
|
|---|
| 311 | > This did compile, but what is the rule of thumb here ?
|
|---|
| 312 |
|
|---|
| 313 | The rule is to avoid trailing context other than fixed-length, in which for
|
|---|
| 314 | a/b, either the 'a' pattern or the 'b' pattern have a fixed length. Use
|
|---|
| 315 | of the '|' operator automatically makes the pattern variable length, so in
|
|---|
| 316 | this case '[Ff]oot' is preferred to '(F|f)oot'.
|
|---|
| 317 |
|
|---|
| 318 | > 4. I changed a rule that looked like this:
|
|---|
| 319 | > <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN...
|
|---|
| 320 | >
|
|---|
| 321 | > to the next 2 rules:
|
|---|
| 322 | > <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;}
|
|---|
| 323 | > <snext8>{and}{bb}/{ROMAN} { BEGIN...
|
|---|
| 324 | >
|
|---|
| 325 | > Again, I understand the using [^...] will cause a great performance loss
|
|---|
| 326 |
|
|---|
| 327 | Actually, it doesn't cause any sort of performance loss. It's a surprising
|
|---|
| 328 | fact about regular expressions that they always match in linear time
|
|---|
| 329 | regardless of how complex they are.
|
|---|
| 330 |
|
|---|
| 331 | > but are there any specific rules about it ?
|
|---|
| 332 |
|
|---|
| 333 | See the "Performance Considerations" section of the man page, and also
|
|---|
| 334 | the example in MISC/fastwc/.
|
|---|
| 335 |
|
|---|
| 336 | Vern
|
|---|
| 337 |
|
|---|
| 338 |
|
|---|
| 339 | File: flex.info, Node: Is backing up a big deal?, Next: Can I fake multi-byte character support?, Prev: Are certain equivalent patterns faster than others?, Up: FAQ
|
|---|
| 340 |
|
|---|
| 341 | Is backing up a big deal?
|
|---|
| 342 | =========================
|
|---|
| 343 |
|
|---|
| 344 |
|
|---|
| 345 | To: Adoram Rogel <[email protected]>
|
|---|
| 346 | Subject: Re: Flex 2.5.2 performance questions
|
|---|
| 347 | In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT.
|
|---|
| 348 | Date: Thu, 19 Sep 96 09:58:00 PDT
|
|---|
| 349 | From: Vern Paxson <vern>
|
|---|
| 350 |
|
|---|
| 351 | > a lot about the backing up problem.
|
|---|
| 352 | > I believe that there lies my biggest problem, and I'll try to improve
|
|---|
| 353 | > it.
|
|---|
| 354 |
|
|---|
| 355 | Since you have variable trailing context, this is a bigger performance
|
|---|
| 356 | problem. Fixing it is usually easier than fixing backing up, which in a
|
|---|
| 357 | complicated scanner (yours seems to fit the bill) can be extremely
|
|---|
| 358 | difficult to do correctly.
|
|---|
| 359 |
|
|---|
| 360 | You also don't mention what flags you are using for your scanner.
|
|---|
| 361 | -f makes a large speed difference, and -Cfe buys you nearly as much
|
|---|
| 362 | speed but the resulting scanner is considerably smaller.
|
|---|
| 363 |
|
|---|
| 364 | > I have an | operator in {and} and in {pats} so both of them are variable
|
|---|
| 365 | > length.
|
|---|
| 366 |
|
|---|
| 367 | -p should have reported this.
|
|---|
| 368 |
|
|---|
| 369 | > Is changing one of them to fixed-length is enough ?
|
|---|
| 370 |
|
|---|
| 371 | Yes.
|
|---|
| 372 |
|
|---|
| 373 | > Is it possible to change the 32,000 states limit ?
|
|---|
| 374 |
|
|---|
| 375 | Yes. I've appended instructions on how. Before you make this change,
|
|---|
| 376 | though, you should think about whether there are ways to fundamentally
|
|---|
| 377 | simplify your scanner - those are certainly preferable!
|
|---|
| 378 |
|
|---|
| 379 | Vern
|
|---|
| 380 |
|
|---|
| 381 | To increase the 32K limit (on a machine with 32 bit integers), you increase
|
|---|
| 382 | the magnitude of the following in flexdef.h:
|
|---|
| 383 |
|
|---|
| 384 | #define JAMSTATE -32766 /* marks a reference to the state that always jams */
|
|---|
| 385 | #define MAXIMUM_MNS 31999
|
|---|
| 386 | #define BAD_SUBSCRIPT -32767
|
|---|
| 387 | #define MAX_SHORT 32700
|
|---|
| 388 |
|
|---|
| 389 | Adding a 0 or two after each should do the trick.
|
|---|
| 390 |
|
|---|
| 391 |
|
|---|
| 392 | File: flex.info, Node: Can I fake multi-byte character support?, Next: deleteme01, Prev: Is backing up a big deal?, Up: FAQ
|
|---|
| 393 |
|
|---|
| 394 | Can I fake multi-byte character support?
|
|---|
| 395 | ========================================
|
|---|
| 396 |
|
|---|
| 397 |
|
|---|
| 398 | To: [email protected]
|
|---|
| 399 | Subject: Re: flex - multi-byte support?
|
|---|
| 400 | In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT.
|
|---|
| 401 | Date: Fri, 04 Oct 1996 11:42:18 PDT
|
|---|
| 402 | From: Vern Paxson <vern>
|
|---|
| 403 |
|
|---|
| 404 | > I assume as long as my *.l file defines the
|
|---|
| 405 | > range of expected character code values (in octal format), flex will
|
|---|
| 406 | > scan the file and read multi-byte characters correctly. But I have no
|
|---|
| 407 | > confidence in this assumption.
|
|---|
| 408 |
|
|---|
| 409 | Your lack of confidence is justified - this won't work.
|
|---|
| 410 |
|
|---|
| 411 | Flex has in it a widespread assumption that the input is processed
|
|---|
| 412 | one byte at a time. Fixing this is on the to-do list, but is involved,
|
|---|
| 413 | so it won't happen any time soon. In the interim, the best I can suggest
|
|---|
| 414 | (unless you want to try fixing it yourself) is to write your rules in
|
|---|
| 415 | terms of pairs of bytes, using definitions in the first section:
|
|---|
| 416 |
|
|---|
| 417 | X \xfe\xc2
|
|---|
| 418 | ...
|
|---|
| 419 | %%
|
|---|
| 420 | foo{X}bar found_foo_fe_c2_bar();
|
|---|
| 421 |
|
|---|
| 422 | etc. Definitely a pain - sorry about that.
|
|---|
| 423 |
|
|---|
| 424 | By the way, the email address you used for me is ancient, indicating you
|
|---|
| 425 | have a very old version of flex. You can get the most recent, 2.5.4, from
|
|---|
| 426 | ftp.ee.lbl.gov.
|
|---|
| 427 |
|
|---|
| 428 | Vern
|
|---|
| 429 |
|
|---|
| 430 |
|
|---|
| 431 | File: flex.info, Node: deleteme01, Next: Can you discuss some flex internals?, Prev: Can I fake multi-byte character support?, Up: FAQ
|
|---|
| 432 |
|
|---|
| 433 | deleteme01
|
|---|
| 434 | ==========
|
|---|
| 435 |
|
|---|
| 436 |
|
|---|
| 437 | To: [email protected]
|
|---|
| 438 | Subject: Re: Flex / Unicode compatibility question
|
|---|
| 439 | In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT.
|
|---|
| 440 | Date: Tue, 22 Oct 1996 11:06:13 PDT
|
|---|
| 441 | From: Vern Paxson <vern>
|
|---|
| 442 |
|
|---|
| 443 | Unfortunately flex at the moment has a widespread assumption within it
|
|---|
| 444 | that characters are processed 8 bits at a time. I don't see any easy
|
|---|
| 445 | fix for this (other than writing your rules in terms of double characters -
|
|---|
| 446 | a pain). I also don't know of a wider lex, though you might try surfing
|
|---|
| 447 | the Plan 9 stuff because I know it's a Unicode system, and also the PCCT
|
|---|
| 448 | toolkit (try searching say Alta Vista for "Purdue Compiler Construction
|
|---|
| 449 | Toolkit").
|
|---|
| 450 |
|
|---|
| 451 | Fixing flex to handle wider characters is on the long-term to-do list.
|
|---|
| 452 | But since flex is a strictly spare-time project these days, this probably
|
|---|
| 453 | won't happen for quite a while, unless someone else does it first.
|
|---|
| 454 |
|
|---|
| 455 | Vern
|
|---|
| 456 |
|
|---|
| 457 |
|
|---|
| 458 | File: flex.info, Node: Can you discuss some flex internals?, Next: unput() messes up yy_at_bol, Prev: deleteme01, Up: FAQ
|
|---|
| 459 |
|
|---|
| 460 | Can you discuss some flex internals?
|
|---|
| 461 | ====================================
|
|---|
| 462 |
|
|---|
| 463 |
|
|---|
| 464 | To: Johan Linde <[email protected]>
|
|---|
| 465 | Subject: Re: translation of flex
|
|---|
| 466 | In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST.
|
|---|
| 467 | Date: Mon, 11 Nov 1996 10:33:50 PST
|
|---|
| 468 | From: Vern Paxson <vern>
|
|---|
| 469 |
|
|---|
| 470 | > I'm working for the Swedish team translating GNU program, and I'm currently
|
|---|
| 471 | > working with flex. I have a few questions about some of the messages which
|
|---|
| 472 | > I hope you can answer.
|
|---|
| 473 |
|
|---|
| 474 | All of the things you're wondering about, by the way, concerning flex
|
|---|
| 475 | internals - probably the only person who understands what they mean in
|
|---|
| 476 | English is me! So I wouldn't worry too much about getting them right.
|
|---|
| 477 | That said ...
|
|---|
| 478 |
|
|---|
| 479 | > #: main.c:545
|
|---|
| 480 | > msgid " %d protos created\n"
|
|---|
| 481 | >
|
|---|
| 482 | > Does proto mean prototype?
|
|---|
| 483 |
|
|---|
| 484 | Yes - prototypes of state compression tables.
|
|---|
| 485 |
|
|---|
| 486 | > #: main.c:539
|
|---|
| 487 | > msgid " %d/%d (peak %d) template nxt-chk entries created\n"
|
|---|
| 488 | >
|
|---|
| 489 | > Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?)
|
|---|
| 490 | > However, 'template next-check entries' doesn't make much sense to me. To be
|
|---|
| 491 | > able to find a good translation I need to know a little bit more about it.
|
|---|
| 492 |
|
|---|
| 493 | There is a scheme in the Aho/Sethi/Ullman compiler book for compressing
|
|---|
| 494 | scanner tables. It involves creating two pairs of tables. The first has
|
|---|
| 495 | "base" and "default" entries, the second has "next" and "check" entries.
|
|---|
| 496 | The "base" entry is indexed by the current state and yields an index into
|
|---|
| 497 | the next/check table. The "default" entry gives what to do if the state
|
|---|
| 498 | transition isn't found in next/check. The "next" entry gives the next
|
|---|
| 499 | state to enter, but only if the "check" entry verifies that this entry is
|
|---|
| 500 | correct for the current state. Flex creates templates of series of
|
|---|
| 501 | next/check entries and then encodes differences from these templates as a
|
|---|
| 502 | way to compress the tables.
|
|---|
| 503 |
|
|---|
| 504 | > #: main.c:533
|
|---|
| 505 | > msgid " %d/%d base-def entries created\n"
|
|---|
| 506 | >
|
|---|
| 507 | > The same problem here for 'base-def'.
|
|---|
| 508 |
|
|---|
| 509 | See above.
|
|---|
| 510 |
|
|---|
| 511 | Vern
|
|---|
| 512 |
|
|---|
| 513 |
|
|---|
| 514 | File: flex.info, Node: unput() messes up yy_at_bol, Next: The | operator is not doing what I want, Prev: Can you discuss some flex internals?, Up: FAQ
|
|---|
| 515 |
|
|---|
| 516 | unput() messes up yy_at_bol
|
|---|
| 517 | ===========================
|
|---|
| 518 |
|
|---|
| 519 |
|
|---|
| 520 | To: Xinying Li <[email protected]>
|
|---|
| 521 | Subject: Re: FLEX ?
|
|---|
| 522 | In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST.
|
|---|
| 523 | Date: Wed, 13 Nov 1996 19:51:54 PST
|
|---|
| 524 | From: Vern Paxson <vern>
|
|---|
| 525 |
|
|---|
| 526 | > "unput()" them to input flow, question occurs. If I do this after I scan
|
|---|
| 527 | > a carriage, the variable "YY_CURRENT_BUFFER->yy_at_bol" is changed. That
|
|---|
| 528 | > means the carriage flag has gone.
|
|---|
| 529 |
|
|---|
| 530 | You can control this by calling yy_set_bol(). It's described in the manual.
|
|---|
| 531 |
|
|---|
| 532 | > And if in pre-reading it goes to the end of file, is anything done
|
|---|
| 533 | > to control the end of curren buffer and end of file?
|
|---|
| 534 |
|
|---|
| 535 | No, there's no way to put back an end-of-file.
|
|---|
| 536 |
|
|---|
| 537 | > By the way I am using flex 2.5.2 and using the "-l".
|
|---|
| 538 |
|
|---|
| 539 | The latest release is 2.5.4, by the way. It fixes some bugs in 2.5.2 and
|
|---|
| 540 | 2.5.3. You can get it from ftp.ee.lbl.gov.
|
|---|
| 541 |
|
|---|
| 542 | Vern
|
|---|
| 543 |
|
|---|
| 544 |
|
|---|
| 545 | File: flex.info, Node: The | operator is not doing what I want, Next: Why can't flex understand this variable trailing context pattern?, Prev: unput() messes up yy_at_bol, Up: FAQ
|
|---|
| 546 |
|
|---|
| 547 | The | operator is not doing what I want
|
|---|
| 548 | =======================================
|
|---|
| 549 |
|
|---|
| 550 |
|
|---|
| 551 | To: [email protected]
|
|---|
| 552 | Subject: Re: Start condition with FLEX
|
|---|
| 553 | In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST.
|
|---|
| 554 | Date: Mon, 18 Nov 1996 10:41:34 PST
|
|---|
| 555 | From: Vern Paxson <vern>
|
|---|
| 556 |
|
|---|
| 557 | > I am not able to use the start condition scope and to use the | (OR) with
|
|---|
| 558 | > rules having start conditions.
|
|---|
| 559 |
|
|---|
| 560 | The problem is that if you use '|' as a regular expression operator, for
|
|---|
| 561 | example "a|b" meaning "match either 'a' or 'b'", then it must *not* have
|
|---|
| 562 | any blanks around it. If you instead want the special '|' *action* (which
|
|---|
| 563 | from your scanner appears to be the case), which is a way of giving two
|
|---|
| 564 | different rules the same action:
|
|---|
| 565 |
|
|---|
| 566 | foo |
|
|---|
| 567 | bar matched_foo_or_bar();
|
|---|
| 568 |
|
|---|
| 569 | then '|' *must* be separated from the first rule by whitespace and *must*
|
|---|
| 570 | be followed by a new line. You *cannot* write it as:
|
|---|
| 571 |
|
|---|
| 572 | foo | bar matched_foo_or_bar();
|
|---|
| 573 |
|
|---|
| 574 | even though you might think you could because yacc supports this syntax.
|
|---|
| 575 | The reason for this unfortunately incompatibility is historical, but it's
|
|---|
| 576 | unlikely to be changed.
|
|---|
| 577 |
|
|---|
| 578 | Your problems with start condition scope are simply due to syntax errors
|
|---|
| 579 | from your use of '|' later confusing flex.
|
|---|
| 580 |
|
|---|
| 581 | Let me know if you still have problems.
|
|---|
| 582 |
|
|---|
| 583 | Vern
|
|---|
| 584 |
|
|---|
| 585 |
|
|---|
| 586 | File: flex.info, Node: Why can't flex understand this variable trailing context pattern?, Next: The ^ operator isn't working, Prev: The | operator is not doing what I want, Up: FAQ
|
|---|
| 587 |
|
|---|
| 588 | Why can't flex understand this variable trailing context pattern?
|
|---|
| 589 | =================================================================
|
|---|
| 590 |
|
|---|
| 591 |
|
|---|
| 592 | To: Gregory Margo <[email protected]>
|
|---|
| 593 | Subject: Re: flex-2.5.3 bug report
|
|---|
| 594 | In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST.
|
|---|
| 595 | Date: Sat, 23 Nov 1996 17:07:32 PST
|
|---|
| 596 | From: Vern Paxson <vern>
|
|---|
| 597 |
|
|---|
| 598 | > Enclosed is a lex file that "real" lex will process, but I cannot get
|
|---|
| 599 | > flex to process it. Could you try it and maybe point me in the right direction?
|
|---|
| 600 |
|
|---|
| 601 | Your problem is that some of the definitions in the scanner use the '/'
|
|---|
| 602 | trailing context operator, and have it enclosed in ()'s. Flex does not
|
|---|
| 603 | allow this operator to be enclosed in ()'s because doing so allows undefined
|
|---|
| 604 | regular expressions such as "(a/b)+". So the solution is to remove the
|
|---|
| 605 | parentheses. Note that you must also be building the scanner with the -l
|
|---|
| 606 | option for AT&T lex compatibility. Without this option, flex automatically
|
|---|
| 607 | encloses the definitions in parentheses.
|
|---|
| 608 |
|
|---|
| 609 | Vern
|
|---|
| 610 |
|
|---|
| 611 |
|
|---|
| 612 | File: flex.info, Node: The ^ operator isn't working, Next: Trailing context is getting confused with trailing optional patterns, Prev: Why can't flex understand this variable trailing context pattern?, Up: FAQ
|
|---|
| 613 |
|
|---|
| 614 | The ^ operator isn't working
|
|---|
| 615 | ============================
|
|---|
| 616 |
|
|---|
| 617 |
|
|---|
| 618 | To: Thomas Hadig <[email protected]>
|
|---|
| 619 | Subject: Re: Flex Bug ?
|
|---|
| 620 | In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST.
|
|---|
| 621 | Date: Tue, 26 Nov 1996 11:15:05 PST
|
|---|
| 622 | From: Vern Paxson <vern>
|
|---|
| 623 |
|
|---|
| 624 | > In my lexer code, i have the line :
|
|---|
| 625 | > ^\*.* { }
|
|---|
| 626 | >
|
|---|
| 627 | > Thus all lines starting with an astrix (*) are comment lines.
|
|---|
| 628 | > This does not work !
|
|---|
| 629 |
|
|---|
| 630 | I can't get this problem to reproduce - it works fine for me. Note
|
|---|
| 631 | though that if what you have is slightly different:
|
|---|
| 632 |
|
|---|
| 633 | COMMENT ^\*.*
|
|---|
| 634 | %%
|
|---|
| 635 | {COMMENT} { }
|
|---|
| 636 |
|
|---|
| 637 | then it won't work, because flex pushes back macro definitions enclosed
|
|---|
| 638 | in ()'s, so the rule becomes
|
|---|
| 639 |
|
|---|
| 640 | (^\*.*) { }
|
|---|
| 641 |
|
|---|
| 642 | and now that the '^' operator is not at the immediate beginning of the
|
|---|
| 643 | line, it's interpreted as just a regular character. You can avoid this
|
|---|
| 644 | behavior by using the "-l" lex-compatibility flag, or "%option lex-compat".
|
|---|
| 645 |
|
|---|
| 646 | Vern
|
|---|
| 647 |
|
|---|
| 648 |
|
|---|
| 649 | File: flex.info, Node: Trailing context is getting confused with trailing optional patterns, Next: Is flex GNU or not?, Prev: The ^ operator isn't working, Up: FAQ
|
|---|
| 650 |
|
|---|
| 651 | Trailing context is getting confused with trailing optional patterns
|
|---|
| 652 | ====================================================================
|
|---|
| 653 |
|
|---|
| 654 |
|
|---|
| 655 | To: Adoram Rogel <[email protected]>
|
|---|
| 656 | Subject: Re: Flex 2.5.4 BOF ???
|
|---|
| 657 | In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST.
|
|---|
| 658 | Date: Wed, 27 Nov 1996 10:56:25 PST
|
|---|
| 659 | From: Vern Paxson <vern>
|
|---|
| 660 |
|
|---|
| 661 | > Organization(s)?/[a-z]
|
|---|
| 662 | >
|
|---|
| 663 | > This matched "Organizations" (looking in debug mode, the trailing s
|
|---|
| 664 | > was matched with trailing context instead of the optional (s) in the
|
|---|
| 665 | > end of the word.
|
|---|
| 666 |
|
|---|
| 667 | That should only happen with lex. Flex can properly match this pattern.
|
|---|
| 668 | (That might be what you're saying, I'm just not sure.)
|
|---|
| 669 |
|
|---|
| 670 | > Is there a way to avoid this dangerous trailing context problem ?
|
|---|
| 671 |
|
|---|
| 672 | Unfortunately, there's no easy way. On the other hand, I don't see why
|
|---|
| 673 | it should be a problem. Lex's matching is clearly wrong, and I'd hope
|
|---|
| 674 | that usually the intent remains the same as expressed with the pattern,
|
|---|
| 675 | so flex's matching will be correct.
|
|---|
| 676 |
|
|---|
| 677 | Vern
|
|---|
| 678 |
|
|---|
| 679 |
|
|---|
| 680 | File: flex.info, Node: Is flex GNU or not?, Next: ERASEME53, Prev: Trailing context is getting confused with trailing optional patterns, Up: FAQ
|
|---|
| 681 |
|
|---|
| 682 | Is flex GNU or not?
|
|---|
| 683 | ===================
|
|---|
| 684 |
|
|---|
| 685 |
|
|---|
| 686 | To: Cameron MacKinnon <[email protected]>
|
|---|
| 687 | Subject: Re: Flex documentation bug
|
|---|
| 688 | In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST.
|
|---|
| 689 | Date: Sun, 01 Dec 1996 22:29:39 PST
|
|---|
| 690 | From: Vern Paxson <vern>
|
|---|
| 691 |
|
|---|
| 692 | > I'm not sure how or where to submit bug reports (documentation or
|
|---|
| 693 | > otherwise) for the GNU project stuff ...
|
|---|
| 694 |
|
|---|
| 695 | Well, strictly speaking flex isn't part of the GNU project. They just
|
|---|
| 696 | distribute it because no one's written a decent GPL'd lex replacement.
|
|---|
| 697 | So you should send bugs directly to me. Those sent to the GNU folks
|
|---|
| 698 | sometimes find there way to me, but some may drop between the cracks.
|
|---|
| 699 |
|
|---|
| 700 | > In GNU Info, under the section 'Start Conditions', and also in the man
|
|---|
| 701 | > page (mine's dated April '95) is a nice little snippet showing how to
|
|---|
| 702 | > parse C quoted strings into a buffer, defined to be MAX_STR_CONST in
|
|---|
| 703 | > size. Unfortunately, no overflow checking is ever done ...
|
|---|
| 704 |
|
|---|
| 705 | This is already mentioned in the manual:
|
|---|
| 706 |
|
|---|
| 707 | Finally, here's an example of how to match C-style quoted
|
|---|
| 708 | strings using exclusive start conditions, including expanded
|
|---|
| 709 | escape sequences (but not including checking for a string
|
|---|
| 710 | that's too long):
|
|---|
| 711 |
|
|---|
| 712 | The reason for not doing the overflow checking is that it will needlessly
|
|---|
| 713 | clutter up an example whose main purpose is just to demonstrate how to
|
|---|
| 714 | use flex.
|
|---|
| 715 |
|
|---|
| 716 | The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov.
|
|---|
| 717 |
|
|---|
| 718 | Vern
|
|---|
| 719 |
|
|---|
| 720 |
|
|---|
| 721 | File: flex.info, Node: ERASEME53, Next: I need to scan if-then-else blocks and while loops, Prev: Is flex GNU or not?, Up: FAQ
|
|---|
| 722 |
|
|---|
| 723 | ERASEME53
|
|---|
| 724 | =========
|
|---|
| 725 |
|
|---|
| 726 |
|
|---|
| 727 | To: [email protected]
|
|---|
| 728 | Subject: Re: Flex (reg)..
|
|---|
| 729 | In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST.
|
|---|
| 730 | Date: Thu, 06 Mar 1997 15:54:19 PST
|
|---|
| 731 | From: Vern Paxson <vern>
|
|---|
| 732 |
|
|---|
| 733 | > [:alpha:] ([:alnum:] | \\_)*
|
|---|
| 734 |
|
|---|
| 735 | If your rule really has embedded blanks as shown above, then it won't
|
|---|
| 736 | work, as the first blank delimits the rule from the action. (It wouldn't
|
|---|
| 737 | even compile ...) You need instead:
|
|---|
| 738 |
|
|---|
| 739 | [:alpha:]([:alnum:]|\\_)*
|
|---|
| 740 |
|
|---|
| 741 | and that should work fine - there's no restriction on what can go inside
|
|---|
| 742 | of ()'s except for the trailing context operator, '/'.
|
|---|
| 743 |
|
|---|
| 744 | Vern
|
|---|
| 745 |
|
|---|
| 746 |
|
|---|
| 747 | File: flex.info, Node: I need to scan if-then-else blocks and while loops, Next: ERASEME55, Prev: ERASEME53, Up: FAQ
|
|---|
| 748 |
|
|---|
| 749 | I need to scan if-then-else blocks and while loops
|
|---|
| 750 | ==================================================
|
|---|
| 751 |
|
|---|
| 752 |
|
|---|
| 753 | To: "Mike Stolnicki" <[email protected]>
|
|---|
| 754 | Subject: Re: FLEX help
|
|---|
| 755 | In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT.
|
|---|
| 756 | Date: Fri, 30 May 1997 10:46:35 PDT
|
|---|
| 757 | From: Vern Paxson <vern>
|
|---|
| 758 |
|
|---|
| 759 | > We'd like to add "if-then-else", "while", and "for" statements to our
|
|---|
| 760 | > language ...
|
|---|
| 761 | > We've investigated many possible solutions. The one solution that seems
|
|---|
| 762 | > the most reasonable involves knowing the position of a TOKEN in yyin.
|
|---|
| 763 |
|
|---|
| 764 | I strongly advise you to instead build a parse tree (abstract syntax tree)
|
|---|
| 765 | and loop over that instead. You'll find this has major benefits in keeping
|
|---|
| 766 | your interpreter simple and extensible.
|
|---|
| 767 |
|
|---|
| 768 | That said, the functionality you mention for get_position and set_position
|
|---|
| 769 | have been on the to-do list for a while. As flex is a purely spare-time
|
|---|
| 770 | project for me, no guarantees when this will be added (in particular, it
|
|---|
| 771 | for sure won't be for many months to come).
|
|---|
| 772 |
|
|---|
| 773 | Vern
|
|---|
| 774 |
|
|---|
| 775 |
|
|---|
| 776 | File: flex.info, Node: ERASEME55, Next: ERASEME56, Prev: I need to scan if-then-else blocks and while loops, Up: FAQ
|
|---|
| 777 |
|
|---|
| 778 | ERASEME55
|
|---|
| 779 | =========
|
|---|
| 780 |
|
|---|
| 781 |
|
|---|
| 782 | To: Colin Paul Adams <[email protected]>
|
|---|
| 783 | Subject: Re: Flex C++ classes and Bison
|
|---|
| 784 | In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT.
|
|---|
| 785 | Date: Fri, 15 Aug 1997 10:48:19 PDT
|
|---|
| 786 | From: Vern Paxson <vern>
|
|---|
| 787 |
|
|---|
| 788 | > #define YY_DECL int yylex (YYSTYPE *lvalp, struct parser_control
|
|---|
| 789 | > *parm)
|
|---|
| 790 | >
|
|---|
| 791 | > I have been trying to get this to work as a C++ scanner, but it does
|
|---|
| 792 | > not appear to be possible (warning that it matches no declarations in
|
|---|
| 793 | > yyFlexLexer, or something like that).
|
|---|
| 794 | >
|
|---|
| 795 | > Is this supposed to be possible, or is it being worked on (I DID
|
|---|
| 796 | > notice the comment that scanner classes are still experimental, so I'm
|
|---|
| 797 | > not too hopeful)?
|
|---|
| 798 |
|
|---|
| 799 | What you need to do is derive a subclass from yyFlexLexer that provides
|
|---|
| 800 | the above yylex() method, squirrels away lvalp and parm into member
|
|---|
| 801 | variables, and then invokes yyFlexLexer::yylex() to do the regular scanning.
|
|---|
| 802 |
|
|---|
| 803 | Vern
|
|---|
| 804 |
|
|---|
| 805 |
|
|---|
| 806 | File: flex.info, Node: ERASEME56, Next: ERASEME57, Prev: ERASEME55, Up: FAQ
|
|---|
| 807 |
|
|---|
| 808 | ERASEME56
|
|---|
| 809 | =========
|
|---|
| 810 |
|
|---|
| 811 |
|
|---|
| 812 | To: [email protected]
|
|---|
| 813 | Subject: Re: Possible mistake in Flex v2.5 document
|
|---|
| 814 | In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT.
|
|---|
| 815 | Date: Fri, 05 Sep 1997 10:01:54 PDT
|
|---|
| 816 | From: Vern Paxson <vern>
|
|---|
| 817 |
|
|---|
| 818 | > In that example you show how to count comment lines when using
|
|---|
| 819 | > C style /* ... */ comments. My question is, shouldn't you take into
|
|---|
| 820 | > account a scenario where end of a comment marker occurs inside
|
|---|
| 821 | > character or string literals?
|
|---|
| 822 |
|
|---|
| 823 | The scanner certainly needs to also scan character and string literals.
|
|---|
| 824 | However it does that (there's an example in the man page for strings), the
|
|---|
| 825 | lexer will recognize the beginning of the literal before it runs across the
|
|---|
| 826 | embedded "/*". Consequently, it will finish scanning the literal before it
|
|---|
| 827 | even considers the possibility of matching "/*".
|
|---|
| 828 |
|
|---|
| 829 | Example:
|
|---|
| 830 |
|
|---|
| 831 | '([^']*|{ESCAPE_SEQUENCE})'
|
|---|
| 832 |
|
|---|
| 833 | will match all the text between the ''s (inclusive). So the lexer
|
|---|
| 834 | considers this as a token beginning at the first ', and doesn't even
|
|---|
| 835 | attempt to match other tokens inside it.
|
|---|
| 836 |
|
|---|
| 837 | I thinnk this subtlety is not worth putting in the manual, as I suspect
|
|---|
| 838 | it would confuse more people than it would enlighten.
|
|---|
| 839 |
|
|---|
| 840 | Vern
|
|---|
| 841 |
|
|---|
| 842 |
|
|---|
| 843 | File: flex.info, Node: ERASEME57, Next: Is there a repository for flex scanners?, Prev: ERASEME56, Up: FAQ
|
|---|
| 844 |
|
|---|
| 845 | ERASEME57
|
|---|
| 846 | =========
|
|---|
| 847 |
|
|---|
| 848 |
|
|---|
| 849 | To: "Marty Leisner" <[email protected]>
|
|---|
| 850 | Subject: Re: flex limitations
|
|---|
| 851 | In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT.
|
|---|
| 852 | Date: Mon, 08 Sep 1997 11:38:08 PDT
|
|---|
| 853 | From: Vern Paxson <vern>
|
|---|
| 854 |
|
|---|
| 855 | > %%
|
|---|
| 856 | > [a-zA-Z]+ /* skip a line */
|
|---|
| 857 | > { printf("got %s\n", yytext); }
|
|---|
| 858 | > %%
|
|---|
| 859 |
|
|---|
| 860 | What version of flex are you using? If I feed this to 2.5.4, it complains:
|
|---|
| 861 |
|
|---|
| 862 | "bug.l", line 5: EOF encountered inside an action
|
|---|
| 863 | "bug.l", line 5: unrecognized rule
|
|---|
| 864 | "bug.l", line 5: fatal parse error
|
|---|
| 865 |
|
|---|
| 866 | Not the world's greatest error message, but it manages to flag the problem.
|
|---|
| 867 |
|
|---|
| 868 | (With the introduction of start condition scopes, flex can't accommodate
|
|---|
| 869 | an action on a separate line, since it's ambiguous with an indented rule.)
|
|---|
| 870 |
|
|---|
| 871 | You can get 2.5.4 from ftp.ee.lbl.gov.
|
|---|
| 872 |
|
|---|
| 873 | Vern
|
|---|
| 874 |
|
|---|
| 875 |
|
|---|
| 876 | File: flex.info, Node: Is there a repository for flex scanners?, Next: How can I conditionally compile or preprocess my flex input file?, Prev: ERASEME57, Up: FAQ
|
|---|
| 877 |
|
|---|
| 878 | Is there a repository for flex scanners?
|
|---|
| 879 | ========================================
|
|---|
| 880 |
|
|---|
| 881 | Not that we know of. You might try asking on comp.compilers.
|
|---|
| 882 |
|
|---|
| 883 |
|
|---|
| 884 | File: flex.info, Node: How can I conditionally compile or preprocess my flex input file?, Next: Where can I find grammars for lex and yacc?, Prev: Is there a repository for flex scanners?, Up: FAQ
|
|---|
| 885 |
|
|---|
| 886 | How can I conditionally compile or preprocess my flex input file?
|
|---|
| 887 | =================================================================
|
|---|
| 888 |
|
|---|
| 889 | Flex doesn't have a preprocessor like C does. You might try using
|
|---|
| 890 | m4, or the C preprocessor plus a sed script to clean up the result.
|
|---|
| 891 |
|
|---|
| 892 |
|
|---|
| 893 | File: flex.info, Node: Where can I find grammars for lex and yacc?, Next: I get an end-of-buffer message for each character scanned., Prev: How can I conditionally compile or preprocess my flex input file?, Up: FAQ
|
|---|
| 894 |
|
|---|
| 895 | Where can I find grammars for lex and yacc?
|
|---|
| 896 | ===========================================
|
|---|
| 897 |
|
|---|
| 898 | In the sources for flex and bison.
|
|---|
| 899 |
|
|---|
| 900 |
|
|---|
| 901 | File: flex.info, Node: I get an end-of-buffer message for each character scanned., Next: unnamed-faq-62, Prev: Where can I find grammars for lex and yacc?, Up: FAQ
|
|---|
| 902 |
|
|---|
| 903 | I get an end-of-buffer message for each character scanned.
|
|---|
| 904 | ==========================================================
|
|---|
| 905 |
|
|---|
| 906 | This will happen if your LexerInput() function returns only one
|
|---|
| 907 | character at a time, which can happen either if you're scanner is
|
|---|
| 908 | "interactive", or if the streams library on your platform always
|
|---|
| 909 | returns 1 for yyin->gcount().
|
|---|
| 910 |
|
|---|
| 911 | Solution: override LexerInput() with a version that returns whole
|
|---|
| 912 | buffers.
|
|---|
| 913 |
|
|---|
| 914 |
|
|---|
| 915 | File: flex.info, Node: unnamed-faq-62, Next: unnamed-faq-63, Prev: I get an end-of-buffer message for each character scanned., Up: FAQ
|
|---|
| 916 |
|
|---|
| 917 | unnamed-faq-62
|
|---|
| 918 | ==============
|
|---|
| 919 |
|
|---|
| 920 |
|
|---|
| 921 | To: [email protected]
|
|---|
| 922 | Subject: Re: Flex maximums
|
|---|
| 923 | In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST.
|
|---|
| 924 | Date: Mon, 17 Nov 1997 17:16:15 PST
|
|---|
| 925 | From: Vern Paxson <vern>
|
|---|
| 926 |
|
|---|
| 927 | > I took a quick look into the flex-sources and altered some #defines in
|
|---|
| 928 | > flexdefs.h:
|
|---|
| 929 | >
|
|---|
| 930 | > #define INITIAL_MNS 64000
|
|---|
| 931 | > #define MNS_INCREMENT 1024000
|
|---|
| 932 | > #define MAXIMUM_MNS 64000
|
|---|
| 933 |
|
|---|
| 934 | The things to fix are to add a couple of zeroes to:
|
|---|
| 935 |
|
|---|
| 936 | #define JAMSTATE -32766 /* marks a reference to the state that always jams */
|
|---|
| 937 | #define MAXIMUM_MNS 31999
|
|---|
| 938 | #define BAD_SUBSCRIPT -32767
|
|---|
| 939 | #define MAX_SHORT 32700
|
|---|
| 940 |
|
|---|
| 941 | and, if you get complaints about too many rules, make the following change too:
|
|---|
| 942 |
|
|---|
| 943 | #define YY_TRAILING_MASK 0x200000
|
|---|
| 944 | #define YY_TRAILING_HEAD_MASK 0x400000
|
|---|
| 945 |
|
|---|
| 946 | - Vern
|
|---|
| 947 |
|
|---|
| 948 |
|
|---|
| 949 | File: flex.info, Node: unnamed-faq-63, Next: unnamed-faq-64, Prev: unnamed-faq-62, Up: FAQ
|
|---|
| 950 |
|
|---|
| 951 | unnamed-faq-63
|
|---|
| 952 | ==============
|
|---|
| 953 |
|
|---|
| 954 |
|
|---|
| 955 | To: [email protected] (Jimmey Todd)
|
|---|
| 956 | Subject: Re: FLEX question regarding istream vs ifstream
|
|---|
| 957 | In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST.
|
|---|
| 958 | Date: Mon, 15 Dec 1997 13:21:35 PST
|
|---|
| 959 | From: Vern Paxson <vern>
|
|---|
| 960 |
|
|---|
| 961 | > stdin_handle = YY_CURRENT_BUFFER;
|
|---|
| 962 | > ifstream fin( "aFile" );
|
|---|
| 963 | > yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) );
|
|---|
| 964 | >
|
|---|
| 965 | > What I'm wanting to do, is pass the contents of a file thru one set
|
|---|
| 966 | > of rules and then pass stdin thru another set... It works great if, I
|
|---|
| 967 | > don't use the C++ classes. But since everything else that I'm doing is
|
|---|
| 968 | > in C++, I thought I'd be consistent.
|
|---|
| 969 | >
|
|---|
| 970 | > The problem is that 'yy_create_buffer' is expecting an istream* as it's
|
|---|
| 971 | > first argument (as stated in the man page). However, fin is a ifstream
|
|---|
| 972 | > object. Any ideas on what I might be doing wrong? Any help would be
|
|---|
| 973 | > appreciated. Thanks!!
|
|---|
| 974 |
|
|---|
| 975 | You need to pass &fin, to turn it into an ifstream* instead of an ifstream.
|
|---|
| 976 | Then its type will be compatible with the expected istream*, because ifstream
|
|---|
| 977 | is derived from istream.
|
|---|
| 978 |
|
|---|
| 979 | Vern
|
|---|
| 980 |
|
|---|
| 981 |
|
|---|
| 982 | File: flex.info, Node: unnamed-faq-64, Next: unnamed-faq-65, Prev: unnamed-faq-63, Up: FAQ
|
|---|
| 983 |
|
|---|
| 984 | unnamed-faq-64
|
|---|
| 985 | ==============
|
|---|
| 986 |
|
|---|
| 987 |
|
|---|
| 988 | To: Enda Fadian <[email protected]>
|
|---|
| 989 | Subject: Re: Question related to Flex man page?
|
|---|
| 990 | In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST.
|
|---|
| 991 | Date: Tue, 16 Dec 1997 14:17:09 PST
|
|---|
| 992 | From: Vern Paxson <vern>
|
|---|
| 993 |
|
|---|
| 994 | > Can you explain to me what is ment by a long-jump in relation to flex?
|
|---|
| 995 |
|
|---|
| 996 | Using the longjmp() function while inside yylex() or a routine called by it.
|
|---|
| 997 |
|
|---|
| 998 | > what is the flex activation frame.
|
|---|
| 999 |
|
|---|
| 1000 | Just yylex()'s stack frame.
|
|---|
| 1001 |
|
|---|
| 1002 | > As far as I can see yyrestart will bring me back to the sart of the input
|
|---|
| 1003 | > file and using flex++ isnot really an option!
|
|---|
| 1004 |
|
|---|
| 1005 | No, yyrestart() doesn't imply a rewind, even though its name might sound
|
|---|
| 1006 | like it does. It tells the scanner to flush its internal buffers and
|
|---|
| 1007 | start reading from the given file at its present location.
|
|---|
| 1008 |
|
|---|
| 1009 | Vern
|
|---|
| 1010 |
|
|---|
| 1011 |
|
|---|
| 1012 | File: flex.info, Node: unnamed-faq-65, Next: unnamed-faq-66, Prev: unnamed-faq-64, Up: FAQ
|
|---|
| 1013 |
|
|---|
| 1014 | unnamed-faq-65
|
|---|
| 1015 | ==============
|
|---|
| 1016 |
|
|---|
| 1017 |
|
|---|
| 1018 | To: [email protected] (Hassan Alaoui)
|
|---|
| 1019 | Subject: Re: Need urgent Help
|
|---|
| 1020 | In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST.
|
|---|
| 1021 | Date: Sun, 21 Dec 1997 21:30:46 PST
|
|---|
| 1022 | From: Vern Paxson <vern>
|
|---|
| 1023 |
|
|---|
| 1024 | > /usr/lib/yaccpar: In function `int yyparse()':
|
|---|
| 1025 | > /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)'
|
|---|
| 1026 | >
|
|---|
| 1027 | > ld: Undefined symbol
|
|---|
| 1028 | > _yylex
|
|---|
| 1029 | > _yyparse
|
|---|
| 1030 | > _yyin
|
|---|
| 1031 |
|
|---|
| 1032 | This is a known problem with Solaris C++ (and/or Solaris yacc). I believe
|
|---|
| 1033 | the fix is to explicitly insert some 'extern "C"' statements for the
|
|---|
| 1034 | corresponding routines/symbols.
|
|---|
| 1035 |
|
|---|
| 1036 | Vern
|
|---|
| 1037 |
|
|---|
| 1038 |
|
|---|
| 1039 | File: flex.info, Node: unnamed-faq-66, Next: unnamed-faq-67, Prev: unnamed-faq-65, Up: FAQ
|
|---|
| 1040 |
|
|---|
| 1041 | unnamed-faq-66
|
|---|
| 1042 | ==============
|
|---|
| 1043 |
|
|---|
| 1044 |
|
|---|
| 1045 | To: [email protected]
|
|---|
| 1046 | Cc: [email protected]
|
|---|
| 1047 | Subject: Re: [[email protected]: Help request]
|
|---|
| 1048 | In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST.
|
|---|
| 1049 | Date: Sun, 21 Dec 1997 22:33:37 PST
|
|---|
| 1050 | From: Vern Paxson <vern>
|
|---|
| 1051 |
|
|---|
| 1052 | > This is my definition for float and integer types:
|
|---|
| 1053 | > . . .
|
|---|
| 1054 | > NZD [1-9]
|
|---|
| 1055 | > ...
|
|---|
| 1056 | > I've tested my program on other lex version (on UNIX Sun Solaris an HP
|
|---|
| 1057 | > UNIX) and it work well, so I think that my definitions are correct.
|
|---|
| 1058 | > There are any differences between Lex and Flex?
|
|---|
| 1059 |
|
|---|
| 1060 | There are indeed differences, as discussed in the man page. The one
|
|---|
| 1061 | you are probably running into is that when flex expands a name definition,
|
|---|
| 1062 | it puts parentheses around the expansion, while lex does not. There's
|
|---|
| 1063 | an example in the man page of how this can lead to different matching.
|
|---|
| 1064 | Flex's behavior complies with the POSIX standard (or at least with the
|
|---|
| 1065 | last POSIX draft I saw).
|
|---|
| 1066 |
|
|---|
| 1067 | Vern
|
|---|
| 1068 |
|
|---|
| 1069 |
|
|---|
| 1070 | File: flex.info, Node: unnamed-faq-67, Next: unnamed-faq-68, Prev: unnamed-faq-66, Up: FAQ
|
|---|
| 1071 |
|
|---|
| 1072 | unnamed-faq-67
|
|---|
| 1073 | ==============
|
|---|
| 1074 |
|
|---|
| 1075 |
|
|---|
| 1076 | To: [email protected] (Hassan Alaoui)
|
|---|
| 1077 | Subject: Re: Thanks
|
|---|
| 1078 | In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST.
|
|---|
| 1079 | Date: Mon, 22 Dec 1997 14:35:05 PST
|
|---|
| 1080 | From: Vern Paxson <vern>
|
|---|
| 1081 |
|
|---|
| 1082 | > Thank you very much for your help. I compile and link well with C++ while
|
|---|
| 1083 | > declaring 'yylex ...' extern, But a little problem remains. I get a
|
|---|
| 1084 | > segmentation default when executing ( I linked with lfl library) while it
|
|---|
| 1085 | > works well when using LEX instead of flex. Do you have some ideas about the
|
|---|
| 1086 | > reason for this ?
|
|---|
| 1087 |
|
|---|
| 1088 | The one possible reason for this that comes to mind is if you've defined
|
|---|
| 1089 | yytext as "extern char yytext[]" (which is what lex uses) instead of
|
|---|
| 1090 | "extern char *yytext" (which is what flex uses). If it's not that, then
|
|---|
| 1091 | I'm afraid I don't know what the problem might be.
|
|---|
| 1092 |
|
|---|
| 1093 | Vern
|
|---|
| 1094 |
|
|---|
| 1095 |
|
|---|
| 1096 | File: flex.info, Node: unnamed-faq-68, Next: unnamed-faq-69, Prev: unnamed-faq-67, Up: FAQ
|
|---|
| 1097 |
|
|---|
| 1098 | unnamed-faq-68
|
|---|
| 1099 | ==============
|
|---|
| 1100 |
|
|---|
| 1101 |
|
|---|
| 1102 | To: "Bart Niswonger" <[email protected]>
|
|---|
| 1103 | Subject: Re: flex 2.5: c++ scanners & start conditions
|
|---|
| 1104 | In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST.
|
|---|
| 1105 | Date: Tue, 06 Jan 1998 19:19:30 PST
|
|---|
| 1106 | From: Vern Paxson <vern>
|
|---|
| 1107 |
|
|---|
| 1108 | > The problem is that when I do this (using %option c++) start
|
|---|
| 1109 | > conditions seem to not apply.
|
|---|
| 1110 |
|
|---|
| 1111 | The BEGIN macro modifies the yy_start variable. For C scanners, this
|
|---|
| 1112 | is a static with scope visible through the whole file. For C++ scanners,
|
|---|
| 1113 | it's a member variable, so it only has visible scope within a member
|
|---|
| 1114 | function. Your lexbegin() routine is not a member function when you
|
|---|
| 1115 | build a C++ scanner, so it's not modifying the correct yy_start. The
|
|---|
| 1116 | diagnostic that indicates this is that you found you needed to add
|
|---|
| 1117 | a declaration of yy_start in order to get your scanner to compile when
|
|---|
| 1118 | using C++; instead, the correct fix is to make lexbegin() a member
|
|---|
| 1119 | function (by deriving from yyFlexLexer).
|
|---|
| 1120 |
|
|---|
| 1121 | Vern
|
|---|
| 1122 |
|
|---|
| 1123 |
|
|---|
| 1124 | File: flex.info, Node: unnamed-faq-69, Next: unnamed-faq-70, Prev: unnamed-faq-68, Up: FAQ
|
|---|
| 1125 |
|
|---|
| 1126 | unnamed-faq-69
|
|---|
| 1127 | ==============
|
|---|
| 1128 |
|
|---|
| 1129 |
|
|---|
| 1130 | To: "Boris Zinin" <[email protected]>
|
|---|
| 1131 | Subject: Re: current position in flex buffer
|
|---|
| 1132 | In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST.
|
|---|
| 1133 | Date: Mon, 12 Jan 1998 12:03:15 PST
|
|---|
| 1134 | From: Vern Paxson <vern>
|
|---|
| 1135 |
|
|---|
| 1136 | > The problem is how to determine the current position in flex active
|
|---|
| 1137 | > buffer when a rule is matched....
|
|---|
| 1138 |
|
|---|
| 1139 | You will need to keep track of this explicitly, such as by redefining
|
|---|
| 1140 | YY_USER_ACTION to count the number of characters matched.
|
|---|
| 1141 |
|
|---|
| 1142 | The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov.
|
|---|
| 1143 |
|
|---|
| 1144 | Vern
|
|---|
| 1145 |
|
|---|
| 1146 |
|
|---|
| 1147 | File: flex.info, Node: unnamed-faq-70, Next: unnamed-faq-71, Prev: unnamed-faq-69, Up: FAQ
|
|---|
| 1148 |
|
|---|
| 1149 | unnamed-faq-70
|
|---|
| 1150 | ==============
|
|---|
| 1151 |
|
|---|
| 1152 |
|
|---|
| 1153 | To: [email protected]
|
|---|
| 1154 | Subject: Re: Flex question
|
|---|
| 1155 | In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST.
|
|---|
| 1156 | Date: Tue, 27 Jan 1998 22:41:52 PST
|
|---|
| 1157 | From: Vern Paxson <vern>
|
|---|
| 1158 |
|
|---|
| 1159 | > That requirement involves knowing
|
|---|
| 1160 | > the character position at which a particular token was matched
|
|---|
| 1161 | > in the lexer.
|
|---|
| 1162 |
|
|---|
| 1163 | The way you have to do this is by explicitly keeping track of where
|
|---|
| 1164 | you are in the file, by counting the number of characters scanned
|
|---|
| 1165 | for each token (available in yyleng). It may prove convenient to
|
|---|
| 1166 | do this by redefining YY_USER_ACTION, as described in the manual.
|
|---|
| 1167 |
|
|---|
| 1168 | Vern
|
|---|
| 1169 |
|
|---|
| 1170 |
|
|---|
| 1171 | File: flex.info, Node: unnamed-faq-71, Next: unnamed-faq-72, Prev: unnamed-faq-70, Up: FAQ
|
|---|
| 1172 |
|
|---|
| 1173 | unnamed-faq-71
|
|---|
| 1174 | ==============
|
|---|
| 1175 |
|
|---|
| 1176 |
|
|---|
| 1177 | To: Vladimir Alexiev <[email protected]>
|
|---|
| 1178 | Subject: Re: flex: how to control start condition from parser?
|
|---|
| 1179 | In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST.
|
|---|
| 1180 | Date: Tue, 27 Jan 1998 22:45:37 PST
|
|---|
| 1181 | From: Vern Paxson <vern>
|
|---|
| 1182 |
|
|---|
| 1183 | > It seems useful for the parser to be able to tell the lexer about such
|
|---|
| 1184 | > context dependencies, because then they don't have to be limited to
|
|---|
| 1185 | > local or sequential context.
|
|---|
| 1186 |
|
|---|
| 1187 | One way to do this is to have the parser call a stub routine that's
|
|---|
| 1188 | included in the scanner's .l file, and consequently that has access ot
|
|---|
| 1189 | BEGIN. The only ugliness is that the parser can't pass in the state
|
|---|
| 1190 | it wants, because those aren't visible - but if you don't have many
|
|---|
| 1191 | such states, then using a different set of names doesn't seem like
|
|---|
| 1192 | to much of a burden.
|
|---|
| 1193 |
|
|---|
| 1194 | While generating a .h file like you suggests is certainly cleaner,
|
|---|
| 1195 | flex development has come to a virtual stand-still :-(, so a workaround
|
|---|
| 1196 | like the above is much more pragmatic than waiting for a new feature.
|
|---|
| 1197 |
|
|---|
| 1198 | Vern
|
|---|
| 1199 |
|
|---|
| 1200 |
|
|---|
| 1201 | File: flex.info, Node: unnamed-faq-72, Next: unnamed-faq-73, Prev: unnamed-faq-71, Up: FAQ
|
|---|
| 1202 |
|
|---|
| 1203 | unnamed-faq-72
|
|---|
| 1204 | ==============
|
|---|
| 1205 |
|
|---|
| 1206 |
|
|---|
| 1207 | To: Barbara Denny <[email protected]>
|
|---|
| 1208 | Subject: Re: freebsd flex bug?
|
|---|
| 1209 | In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST.
|
|---|
| 1210 | Date: Fri, 30 Jan 1998 12:42:32 PST
|
|---|
| 1211 | From: Vern Paxson <vern>
|
|---|
| 1212 |
|
|---|
| 1213 | > lex.yy.c:1996: parse error before `='
|
|---|
| 1214 |
|
|---|
| 1215 | This is the key, identifying this error. (It may help to pinpoint
|
|---|
| 1216 | it by using flex -L, so it doesn't generate #line directives in its
|
|---|
| 1217 | output.) I will bet you heavy money that you have a start condition
|
|---|
| 1218 | name that is also a variable name, or something like that; flex spits
|
|---|
| 1219 | out #define's for each start condition name, mapping them to a number,
|
|---|
| 1220 | so you can wind up with:
|
|---|
| 1221 |
|
|---|
| 1222 | %x foo
|
|---|
| 1223 | %%
|
|---|
| 1224 | ...
|
|---|
| 1225 | %%
|
|---|
| 1226 | void bar()
|
|---|
| 1227 | {
|
|---|
| 1228 | int foo = 3;
|
|---|
| 1229 | }
|
|---|
| 1230 |
|
|---|
| 1231 | and the penultimate will turn into "int 1 = 3" after C preprocessing,
|
|---|
| 1232 | since flex will put "#define foo 1" in the generated scanner.
|
|---|
| 1233 |
|
|---|
| 1234 | Vern
|
|---|
| 1235 |
|
|---|
| 1236 |
|
|---|
| 1237 | File: flex.info, Node: unnamed-faq-73, Next: unnamed-faq-74, Prev: unnamed-faq-72, Up: FAQ
|
|---|
| 1238 |
|
|---|
| 1239 | unnamed-faq-73
|
|---|
| 1240 | ==============
|
|---|
| 1241 |
|
|---|
| 1242 |
|
|---|
| 1243 | To: Maurice Petrie <[email protected]>
|
|---|
| 1244 | Subject: Re: Lost flex .l file
|
|---|
| 1245 | In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST.
|
|---|
| 1246 | Date: Mon, 02 Feb 1998 11:15:12 PST
|
|---|
| 1247 | From: Vern Paxson <vern>
|
|---|
| 1248 |
|
|---|
| 1249 | > I am curious as to
|
|---|
| 1250 | > whether there is a simple way to backtrack from the generated source to
|
|---|
| 1251 | > reproduce the lost list of tokens we are searching on.
|
|---|
| 1252 |
|
|---|
| 1253 | In theory, it's straight-forward to go from the DFA representation
|
|---|
| 1254 | back to a regular-expression representation - the two are isomorphic.
|
|---|
| 1255 | In practice, a huge headache, because you have to unpack all the tables
|
|---|
| 1256 | back into a single DFA representation, and then write a program to munch
|
|---|
| 1257 | on that and translate it into an RE.
|
|---|
| 1258 |
|
|---|
| 1259 | Sorry for the less-than-happy news ...
|
|---|
| 1260 |
|
|---|
| 1261 | Vern
|
|---|
| 1262 |
|
|---|
| 1263 |
|
|---|
| 1264 | File: flex.info, Node: unnamed-faq-74, Next: unnamed-faq-75, Prev: unnamed-faq-73, Up: FAQ
|
|---|
| 1265 |
|
|---|
| 1266 | unnamed-faq-74
|
|---|
| 1267 | ==============
|
|---|
| 1268 |
|
|---|
| 1269 |
|
|---|
| 1270 | To: [email protected] (Jimmey Todd)
|
|---|
| 1271 | Subject: Re: Flex performance question
|
|---|
| 1272 | In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
|
|---|
| 1273 | Date: Thu, 19 Feb 1998 08:48:51 PST
|
|---|
| 1274 | From: Vern Paxson <vern>
|
|---|
| 1275 |
|
|---|
| 1276 | > What I have found, is that the smaller the data chunk, the faster the
|
|---|
| 1277 | > program executes. This is the opposite of what I expected. Should this be
|
|---|
| 1278 | > happening this way?
|
|---|
| 1279 |
|
|---|
| 1280 | This is exactly what will happen if your input file has embedded NULs.
|
|---|
| 1281 | From the man page:
|
|---|
| 1282 |
|
|---|
| 1283 | A final note: flex is slow when matching NUL's, particularly
|
|---|
| 1284 | when a token contains multiple NUL's. It's best to write
|
|---|
| 1285 | rules which match short amounts of text if it's anticipated
|
|---|
| 1286 | that the text will often include NUL's.
|
|---|
| 1287 |
|
|---|
| 1288 | So that's the first thing to look for.
|
|---|
| 1289 |
|
|---|
| 1290 | Vern
|
|---|
| 1291 |
|
|---|
| 1292 |
|
|---|
| 1293 | File: flex.info, Node: unnamed-faq-75, Next: unnamed-faq-76, Prev: unnamed-faq-74, Up: FAQ
|
|---|
| 1294 |
|
|---|
| 1295 | unnamed-faq-75
|
|---|
| 1296 | ==============
|
|---|
| 1297 |
|
|---|
| 1298 |
|
|---|
| 1299 | To: [email protected] (Jimmey Todd)
|
|---|
| 1300 | Subject: Re: Flex performance question
|
|---|
| 1301 | In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
|
|---|
| 1302 | Date: Thu, 19 Feb 1998 15:42:25 PST
|
|---|
| 1303 | From: Vern Paxson <vern>
|
|---|
| 1304 |
|
|---|
| 1305 | So there are several problems.
|
|---|
| 1306 |
|
|---|
| 1307 | First, to go fast, you want to match as much text as possible, which
|
|---|
| 1308 | your scanners don't in the case that what they're scanning is *not*
|
|---|
| 1309 | a <RN> tag. So you want a rule like:
|
|---|
| 1310 |
|
|---|
| 1311 | [^<]+
|
|---|
| 1312 |
|
|---|
| 1313 | Second, C++ scanners are particularly slow if they're interactive,
|
|---|
| 1314 | which they are by default. Using -B speeds it up by a factor of 3-4
|
|---|
| 1315 | on my workstation.
|
|---|
| 1316 |
|
|---|
| 1317 | Third, C++ scanners that use the istream interface are slow, because
|
|---|
| 1318 | of how poorly implemented istream's are. I built two versions of
|
|---|
| 1319 | the following scanner:
|
|---|
| 1320 |
|
|---|
| 1321 | %%
|
|---|
| 1322 | .*\n
|
|---|
| 1323 | .*
|
|---|
| 1324 | %%
|
|---|
| 1325 |
|
|---|
| 1326 | and the C version inhales a 2.5MB file on my workstation in 0.8 seconds.
|
|---|
| 1327 | The C++ istream version, using -B, takes 3.8 seconds.
|
|---|
| 1328 |
|
|---|
| 1329 | Vern
|
|---|
| 1330 |
|
|---|