| 1 | This is flex.info, produced by makeinfo version 4.5 from flex.texi.
|
|---|
| 2 |
|
|---|
| 3 | INFO-DIR-SECTION Programming
|
|---|
| 4 | START-INFO-DIR-ENTRY
|
|---|
| 5 | * flex: (flex). Fast lexical analyzer generator (lex replacement).
|
|---|
| 6 | END-INFO-DIR-ENTRY
|
|---|
| 7 |
|
|---|
| 8 |
|
|---|
| 9 | The flex manual is placed under the same licensing conditions as the
|
|---|
| 10 | rest of flex:
|
|---|
| 11 |
|
|---|
| 12 | Copyright (C) 1990, 1997 The Regents of the University of California.
|
|---|
| 13 | All rights reserved.
|
|---|
| 14 |
|
|---|
| 15 | This code is derived from software contributed to Berkeley by Vern
|
|---|
| 16 | Paxson.
|
|---|
| 17 |
|
|---|
| 18 | The United States Government has rights in this work pursuant to
|
|---|
| 19 | contract no. DE-AC03-76SF00098 between the United States Department of
|
|---|
| 20 | Energy and the University of California.
|
|---|
| 21 |
|
|---|
| 22 | Redistribution and use in source and binary forms, with or without
|
|---|
| 23 | modification, are permitted provided that the following conditions are
|
|---|
| 24 | met:
|
|---|
| 25 |
|
|---|
| 26 | 1. Redistributions of source code must retain the above copyright
|
|---|
| 27 | notice, this list of conditions and the following disclaimer.
|
|---|
| 28 |
|
|---|
| 29 | 2. Redistributions in binary form must reproduce the above copyright
|
|---|
| 30 | notice, this list of conditions and the following disclaimer in the
|
|---|
| 31 | documentation and/or other materials provided with the
|
|---|
| 32 | distribution.
|
|---|
| 33 | Neither the name of the University nor the names of its contributors
|
|---|
| 34 | may be used to endorse or promote products derived from this software
|
|---|
| 35 | without specific prior written permission.
|
|---|
| 36 |
|
|---|
| 37 | THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
|
|---|
| 38 | WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
|
|---|
| 39 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
|
|---|
| 40 |
|
|---|
| 41 | File: flex.info, Node: Overriding The Default Memory Management, Next: A Note About yytext And Memory, Prev: The Default Memory Management, Up: Memory Management
|
|---|
| 42 |
|
|---|
| 43 | Overriding The Default Memory Management
|
|---|
| 44 | ========================================
|
|---|
| 45 |
|
|---|
| 46 | Flex calls the functions `yyalloc', `yyrealloc', and `yyfree' when
|
|---|
| 47 | it needs to allocate or free memory. By default, these functions are
|
|---|
| 48 | wrappers around the standard C functions, `malloc', `realloc', and
|
|---|
| 49 | `free', respectively. You can override the default implementations by
|
|---|
| 50 | telling flex that you will provide your own implementations.
|
|---|
| 51 |
|
|---|
| 52 | To override the default implementations, you must do two things:
|
|---|
| 53 |
|
|---|
| 54 | 1. Suppress the default implementations by specifying one or more of
|
|---|
| 55 | the following options:
|
|---|
| 56 |
|
|---|
| 57 | * `%option noyyalloc'
|
|---|
| 58 |
|
|---|
| 59 | * `%option noyyrealloc'
|
|---|
| 60 |
|
|---|
| 61 | * `%option noyyfree'.
|
|---|
| 62 |
|
|---|
| 63 | 2. Provide your own implementation of the following functions: (1)
|
|---|
| 64 |
|
|---|
| 65 |
|
|---|
| 66 | // For a non-reentrant scanner
|
|---|
| 67 | void * yyalloc (size_t bytes);
|
|---|
| 68 | void * yyrealloc (void * ptr, size_t bytes);
|
|---|
| 69 | void yyfree (void * ptr);
|
|---|
| 70 |
|
|---|
| 71 | // For a reentrant scanner
|
|---|
| 72 | void * yyalloc (size_t bytes, void * yyscanner);
|
|---|
| 73 | void * yyrealloc (void * ptr, size_t bytes, void * yyscanner);
|
|---|
| 74 | void yyfree (void * ptr, void * yyscanner);
|
|---|
| 75 |
|
|---|
| 76 |
|
|---|
| 77 | In the following example, we will override all three memory
|
|---|
| 78 | routines. We assume that there is a custom allocator with garbage
|
|---|
| 79 | collection. In order to make this example interesting, we will use a
|
|---|
| 80 | reentrant scanner, passing a pointer to the custom allocator through
|
|---|
| 81 | `yyextra'.
|
|---|
| 82 |
|
|---|
| 83 |
|
|---|
| 84 | %{
|
|---|
| 85 | #include "some_allocator.h"
|
|---|
| 86 | %}
|
|---|
| 87 |
|
|---|
| 88 | /* Suppress the default implementations. */
|
|---|
| 89 | %option noyyalloc noyyrealloc noyyfree
|
|---|
| 90 | %option reentrant
|
|---|
| 91 |
|
|---|
| 92 | /* Initialize the allocator. */
|
|---|
| 93 | #define YY_EXTRA_TYPE struct allocator*
|
|---|
| 94 | #define YY_USER_INIT yyextra = allocator_create();
|
|---|
| 95 |
|
|---|
| 96 | %%
|
|---|
| 97 | .|\n ;
|
|---|
| 98 | %%
|
|---|
| 99 |
|
|---|
| 100 | /* Provide our own implementations. */
|
|---|
| 101 | void * yyalloc (size_t bytes, void* yyscanner) {
|
|---|
| 102 | return allocator_alloc (yyextra, bytes);
|
|---|
| 103 | }
|
|---|
| 104 |
|
|---|
| 105 | void * yyrealloc (void * ptr, size_t bytes, void* yyscanner) {
|
|---|
| 106 | return allocator_realloc (yyextra, bytes);
|
|---|
| 107 | }
|
|---|
| 108 |
|
|---|
| 109 | void yyfree (void * ptr, void * yyscanner) {
|
|---|
| 110 | /* Do nothing -- we leave it to the garbage collector. */
|
|---|
| 111 | }
|
|---|
| 112 |
|
|---|
| 113 | ---------- Footnotes ----------
|
|---|
| 114 |
|
|---|
| 115 | (1) It is not necessary to override all (or any) of the memory
|
|---|
| 116 | management routines. You may, for example, override `yyrealloc', but
|
|---|
| 117 | not `yyfree' or `yyalloc'.
|
|---|
| 118 |
|
|---|
| 119 |
|
|---|
| 120 | File: flex.info, Node: A Note About yytext And Memory, Prev: Overriding The Default Memory Management, Up: Memory Management
|
|---|
| 121 |
|
|---|
| 122 | A Note About yytext And Memory
|
|---|
| 123 | ==============================
|
|---|
| 124 |
|
|---|
| 125 | When flex finds a match, `yytext' points to the first character of
|
|---|
| 126 | the match in the input buffer. The string itself is part of the input
|
|---|
| 127 | buffer, and is _NOT_ allocated separately. The value of yytext will be
|
|---|
| 128 | overwritten the next time yylex() is called. In short, the value of
|
|---|
| 129 | yytext is only valid from within the matched rule's action.
|
|---|
| 130 |
|
|---|
| 131 | Often, you want the value of yytext to persist for later processing,
|
|---|
| 132 | i.e., by a parser with non-zero lookahead. In order to preserve yytext,
|
|---|
| 133 | you will have to copy it with strdup() or a similar function. But this
|
|---|
| 134 | introduces some headache because your parser is now responsible for
|
|---|
| 135 | freeing the copy of yytext. If you use a yacc or bison parser,
|
|---|
| 136 | (commonly used with flex), you will discover that the error recovery
|
|---|
| 137 | mechanisms can cause memory to be leaked.
|
|---|
| 138 |
|
|---|
| 139 | To prevent memory leaks from strdup'd yytext, you will have to track
|
|---|
| 140 | the memory somehow. Our experience has shown that a garbage collection
|
|---|
| 141 | mechanism or a pooled memory mechanism will save you a lot of grief
|
|---|
| 142 | when writing parsers.
|
|---|
| 143 |
|
|---|
| 144 |
|
|---|
| 145 | File: flex.info, Node: Serialized Tables, Next: Diagnostics, Prev: Memory Management, Up: Top
|
|---|
| 146 |
|
|---|
| 147 | Serialized Tables
|
|---|
| 148 | *****************
|
|---|
| 149 |
|
|---|
| 150 | A `flex' scanner has the ability to save the DFA tables to a file,
|
|---|
| 151 | and load them at runtime when needed. The motivation for this feature
|
|---|
| 152 | is to reduce the runtime memory footprint. Traditionally, these tables
|
|---|
| 153 | have been compiled into the scanner as C arrays, and are sometimes
|
|---|
| 154 | quite large. Since the tables are compiled into the scanner, the
|
|---|
| 155 | memory used by the tables can never be freed. This is a waste of
|
|---|
| 156 | memory, especially if an application uses several scanners, but none of
|
|---|
| 157 | them at the same time.
|
|---|
| 158 |
|
|---|
| 159 | The serialization feature allows the tables to be loaded at runtime,
|
|---|
| 160 | before scanning begins. The tables may be discarded when scanning is
|
|---|
| 161 | finished.
|
|---|
| 162 |
|
|---|
| 163 | * Menu:
|
|---|
| 164 |
|
|---|
| 165 | * Creating Serialized Tables::
|
|---|
| 166 | * Loading and Unloading Serialized Tables::
|
|---|
| 167 | * Tables File Format::
|
|---|
| 168 |
|
|---|
| 169 |
|
|---|
| 170 | File: flex.info, Node: Creating Serialized Tables, Next: Loading and Unloading Serialized Tables, Prev: Serialized Tables, Up: Serialized Tables
|
|---|
| 171 |
|
|---|
| 172 | Creating Serialized Tables
|
|---|
| 173 | ==========================
|
|---|
| 174 |
|
|---|
| 175 | You may create a scanner with serialized tables by specifying:
|
|---|
| 176 |
|
|---|
| 177 |
|
|---|
| 178 | %option tables-file=FILE
|
|---|
| 179 | or
|
|---|
| 180 | --tables-file=FILE
|
|---|
| 181 |
|
|---|
| 182 | These options instruct flex to save the DFA tables to the file FILE.
|
|---|
| 183 | The tables will _not_ be embedded in the generated scanner. The scanner
|
|---|
| 184 | will not function on its own. The scanner will be dependent upon the
|
|---|
| 185 | serialized tables. You must load the tables from this file at runtime
|
|---|
| 186 | before you can scan anything.
|
|---|
| 187 |
|
|---|
| 188 | If you do not specify a filename to `--tables-file', the tables will
|
|---|
| 189 | be saved to `lex.yy.tables', where `yy' is the appropriate prefix.
|
|---|
| 190 |
|
|---|
| 191 | If your project uses several different scanners, you can concatenate
|
|---|
| 192 | the serialized tables into one file, and flex will find the correct set
|
|---|
| 193 | of tables, using the scanner prefix as part of the lookup key. An
|
|---|
| 194 | example follows:
|
|---|
| 195 |
|
|---|
| 196 |
|
|---|
| 197 | $ flex --tables-file --prefix=cpp cpp.l
|
|---|
| 198 | $ flex --tables-file --prefix=c c.l
|
|---|
| 199 | $ cat lex.cpp.tables lex.c.tables > all.tables
|
|---|
| 200 |
|
|---|
| 201 | The above example created two scanners, `cpp', and `c'. Since we did
|
|---|
| 202 | not specify a filename, the tables were serialized to `lex.c.tables' and
|
|---|
| 203 | `lex.cpp.tables', respectively. Then, we concatenated the two files
|
|---|
| 204 | together into `all.tables', which we will distribute with our project.
|
|---|
| 205 | At runtime, we will open the file and tell flex to load the tables from
|
|---|
| 206 | it. Flex will find the correct tables automatically. (See next
|
|---|
| 207 | section).
|
|---|
| 208 |
|
|---|
| 209 |
|
|---|
| 210 | File: flex.info, Node: Loading and Unloading Serialized Tables, Next: Tables File Format, Prev: Creating Serialized Tables, Up: Serialized Tables
|
|---|
| 211 |
|
|---|
| 212 | Loading and Unloading Serialized Tables
|
|---|
| 213 | =======================================
|
|---|
| 214 |
|
|---|
| 215 | If you've built your scanner with `%option tables-file', then you
|
|---|
| 216 | must load the scanner tables at runtime. This can be accomplished with
|
|---|
| 217 | the following function:
|
|---|
| 218 |
|
|---|
| 219 | - Function: int yytables_fload (FILE* FP [, yyscan_t SCANNER])
|
|---|
| 220 | Locates scanner tables in the stream pointed to by FP and loads
|
|---|
| 221 | them. Memory for the tables is allocated via `yyalloc'. You must
|
|---|
| 222 | call this function before the first call to `yylex'. The argument
|
|---|
| 223 | SCANNER only appears in the reentrant scanner. This function
|
|---|
| 224 | returns `0' (zero) on success, or non-zero on error.
|
|---|
| 225 |
|
|---|
| 226 | The loaded tables are *not* automatically destroyed (unloaded) when
|
|---|
| 227 | you call `yylex_destroy'. The reason is that you may create several
|
|---|
| 228 | scanners of the same type (in a reentrant scanner), each of which needs
|
|---|
| 229 | access to these tables. To avoid a nasty memory leak, you must call
|
|---|
| 230 | the following function:
|
|---|
| 231 |
|
|---|
| 232 | - Function: int yytables_destroy ([yyscan_t SCANNER])
|
|---|
| 233 | Unloads the scanner tables. The tables must be loaded again before
|
|---|
| 234 | you can scan any more data. The argument SCANNER only appears in
|
|---|
| 235 | the reentrant scanner. This function returns `0' (zero) on
|
|---|
| 236 | success, or non-zero on error.
|
|---|
| 237 |
|
|---|
| 238 | *The functions `yytables_fload' and `yytables_destroy' are not
|
|---|
| 239 | thread-safe.* You must ensure that these functions are called exactly
|
|---|
| 240 | once (for each scanner type) in a threaded program, before any thread
|
|---|
| 241 | calls `yylex'. After the tables are loaded, they are never written to,
|
|---|
| 242 | and no thread protection is required thereafter - until you destroy
|
|---|
| 243 | them.
|
|---|
| 244 |
|
|---|
| 245 |
|
|---|
| 246 | File: flex.info, Node: Tables File Format, Prev: Loading and Unloading Serialized Tables, Up: Serialized Tables
|
|---|
| 247 |
|
|---|
| 248 | Tables File Format
|
|---|
| 249 | ==================
|
|---|
| 250 |
|
|---|
| 251 | This section defines the file format of serialized `flex' tables.
|
|---|
| 252 |
|
|---|
| 253 | The tables format allows for one or more sets of tables to be
|
|---|
| 254 | specified, where each set corresponds to a given scanner. Scanners are
|
|---|
| 255 | indexed by name, as described below. The file format is as follows:
|
|---|
| 256 |
|
|---|
| 257 |
|
|---|
| 258 | TABLE SET 1
|
|---|
| 259 | +-------------------------------+
|
|---|
| 260 | Header | uint32 th_magic; |
|
|---|
| 261 | | uint32 th_hsize; |
|
|---|
| 262 | | uint32 th_ssize; |
|
|---|
| 263 | | uint16 th_flags; |
|
|---|
| 264 | | char th_version[]; |
|
|---|
| 265 | | char th_name[]; |
|
|---|
| 266 | | uint8 th_pad64[]; |
|
|---|
| 267 | +-------------------------------+
|
|---|
| 268 | Table 1 | uint16 td_id; |
|
|---|
| 269 | | uint16 td_flags; |
|
|---|
| 270 | | uint32 td_lolen; |
|
|---|
| 271 | | uint32 td_hilen; |
|
|---|
| 272 | | void td_data[]; |
|
|---|
| 273 | | uint8 td_pad64[]; |
|
|---|
| 274 | +-------------------------------+
|
|---|
| 275 | Table 2 | |
|
|---|
| 276 | . . .
|
|---|
| 277 | . . .
|
|---|
| 278 | . . .
|
|---|
| 279 | . . .
|
|---|
| 280 | Table n | |
|
|---|
| 281 | +-------------------------------+
|
|---|
| 282 | TABLE SET 2
|
|---|
| 283 | .
|
|---|
| 284 | .
|
|---|
| 285 | .
|
|---|
| 286 | TABLE SET N
|
|---|
| 287 |
|
|---|
| 288 | The above diagram shows that a complete set of tables consists of a
|
|---|
| 289 | header followed by multiple individual tables. Furthermore, multiple
|
|---|
| 290 | complete sets may be present in the same file, each set with its own
|
|---|
| 291 | header and tables. The sets are contiguous in the file. The only way to
|
|---|
| 292 | know if another set follows is to check the next four bytes for the
|
|---|
| 293 | magic number (or check for EOF). The header and tables sections are
|
|---|
| 294 | padded to 64-bit boundaries. Below we describe each field in detail.
|
|---|
| 295 | This format does not specify how the scanner will expand the given
|
|---|
| 296 | data, i.e., data may be serialized as int8, but expanded to an int32
|
|---|
| 297 | array at runtime. This is to reduce the size of the serialized data
|
|---|
| 298 | where possible. Remember, _all integer values are in network byte
|
|---|
| 299 | order_.
|
|---|
| 300 |
|
|---|
| 301 | Fields of a table header:
|
|---|
| 302 |
|
|---|
| 303 | `th_magic'
|
|---|
| 304 | Magic number, always 0xF13C57B1.
|
|---|
| 305 |
|
|---|
| 306 | `th_hsize'
|
|---|
| 307 | Size of this entire header, in bytes, including all fields plus
|
|---|
| 308 | any padding.
|
|---|
| 309 |
|
|---|
| 310 | `th_ssize'
|
|---|
| 311 | Size of this entire set, in bytes, including the header, all
|
|---|
| 312 | tables, plus any padding.
|
|---|
| 313 |
|
|---|
| 314 | `th_flags'
|
|---|
| 315 | Bit flags for this table set. Currently unused.
|
|---|
| 316 |
|
|---|
| 317 | `th_version[]'
|
|---|
| 318 | Flex version in NULL-termninated string format. e.g., `2.5.13a'.
|
|---|
| 319 | This is the version of flex that was used to create the serialized
|
|---|
| 320 | tables.
|
|---|
| 321 |
|
|---|
| 322 | `th_name[]'
|
|---|
| 323 | Contains the name of this table set. The default is `yytables',
|
|---|
| 324 | and is prefixed accordingly, e.g., `footables'. Must be
|
|---|
| 325 | NULL-terminated.
|
|---|
| 326 |
|
|---|
| 327 | `th_pad64[]'
|
|---|
| 328 | Zero or more NULL bytes, padding the entire header to the next
|
|---|
| 329 | 64-bit boundary as calculated from the beginning of the header.
|
|---|
| 330 |
|
|---|
| 331 | Fields of a table:
|
|---|
| 332 |
|
|---|
| 333 | `td_id'
|
|---|
| 334 | Specifies the table identifier. Possible values are:
|
|---|
| 335 | `YYTD_ID_ACCEPT (0x01)'
|
|---|
| 336 | `yy_accept'
|
|---|
| 337 |
|
|---|
| 338 | `YYTD_ID_BASE (0x02)'
|
|---|
| 339 | `yy_base'
|
|---|
| 340 |
|
|---|
| 341 | `YYTD_ID_CHK (0x03)'
|
|---|
| 342 | `yy_chk'
|
|---|
| 343 |
|
|---|
| 344 | `YYTD_ID_DEF (0x04)'
|
|---|
| 345 | `yy_def'
|
|---|
| 346 |
|
|---|
| 347 | `YYTD_ID_EC (0x05)'
|
|---|
| 348 | `yy_ec '
|
|---|
| 349 |
|
|---|
| 350 | `YYTD_ID_META (0x06)'
|
|---|
| 351 | `yy_meta'
|
|---|
| 352 |
|
|---|
| 353 | `YYTD_ID_NUL_TRANS (0x07)'
|
|---|
| 354 | `yy_NUL_trans'
|
|---|
| 355 |
|
|---|
| 356 | `YYTD_ID_NXT (0x08)'
|
|---|
| 357 | `yy_nxt'. This array may be two dimensional. See the
|
|---|
| 358 | `td_hilen' field below.
|
|---|
| 359 |
|
|---|
| 360 | `YYTD_ID_RULE_CAN_MATCH_EOL (0x09)'
|
|---|
| 361 | `yy_rule_can_match_eol'
|
|---|
| 362 |
|
|---|
| 363 | `YYTD_ID_START_STATE_LIST (0x0A)'
|
|---|
| 364 | `yy_start_state_list'. This array is handled specially
|
|---|
| 365 | because it is an array of pointers to structs. See the
|
|---|
| 366 | `td_flags' field below.
|
|---|
| 367 |
|
|---|
| 368 | `YYTD_ID_TRANSITION (0x0B)'
|
|---|
| 369 | `yy_transition'. This array is handled specially because it
|
|---|
| 370 | is an array of structs. See the `td_lolen' field below.
|
|---|
| 371 |
|
|---|
| 372 | `YYTD_ID_ACCLIST (0x0C)'
|
|---|
| 373 | `yy_acclist'
|
|---|
| 374 |
|
|---|
| 375 | `td_flags'
|
|---|
| 376 | Bit flags describing how to interpret the data in `td_data'. The
|
|---|
| 377 | data arrays are one-dimensional by default, but may be two
|
|---|
| 378 | dimensional as specified in the `td_hilen' field.
|
|---|
| 379 |
|
|---|
| 380 | `YYTD_DATA8 (0x01)'
|
|---|
| 381 | The data is serialized as an array of type int8.
|
|---|
| 382 |
|
|---|
| 383 | `YYTD_DATA16 (0x02)'
|
|---|
| 384 | The data is serialized as an array of type int16.
|
|---|
| 385 |
|
|---|
| 386 | `YYTD_DATA32 (0x04)'
|
|---|
| 387 | The data is serialized as an array of type int32.
|
|---|
| 388 |
|
|---|
| 389 | `YYTD_PTRANS (0x08)'
|
|---|
| 390 | The data is a list of indexes of entries in the expanded
|
|---|
| 391 | `yy_transition' array. Each index should be expanded to a
|
|---|
| 392 | pointer to the corresponding entry in the `yy_transition'
|
|---|
| 393 | array. We count on the fact that the `yy_transition' array
|
|---|
| 394 | has already been seen.
|
|---|
| 395 |
|
|---|
| 396 | `YYTD_STRUCT (0x10)'
|
|---|
| 397 | The data is a list of yy_trans_info structs, each of which
|
|---|
| 398 | consists of two integers. There is no padding between struct
|
|---|
| 399 | elements or between structs. The type of each member is
|
|---|
| 400 | determined by the `YYTD_DATA*' bits.
|
|---|
| 401 |
|
|---|
| 402 | `td_lolen'
|
|---|
| 403 | Specifies the number of elements in the lowest dimension array. If
|
|---|
| 404 | this is a one-dimensional array, then it is simply the number of
|
|---|
| 405 | elements in this array. The element size is determined by the
|
|---|
| 406 | `td_flags' field.
|
|---|
| 407 |
|
|---|
| 408 | `td_hilen'
|
|---|
| 409 | If `td_hilen' is non-zero, then the data is a two-dimensional
|
|---|
| 410 | array. Otherwise, the data is a one-dimensional array. `td_hilen'
|
|---|
| 411 | contains the number of elements in the higher dimensional array,
|
|---|
| 412 | and `td_lolen' contains the number of elements in the lowest
|
|---|
| 413 | dimension.
|
|---|
| 414 |
|
|---|
| 415 | Conceptually, `td_data' is either `sometype td_data[td_lolen]', or
|
|---|
| 416 | `sometype td_data[td_hilen][td_lolen]', where `sometype' is
|
|---|
| 417 | specified by the `td_flags' field. It is possible for both
|
|---|
| 418 | `td_lolen' and `td_hilen' to be zero, in which case `td_data' is a
|
|---|
| 419 | zero length array, and no data is loaded, i.e., this table is
|
|---|
| 420 | simply skipped. Flex does not currently generate tables of zero
|
|---|
| 421 | length.
|
|---|
| 422 |
|
|---|
| 423 | `td_data[]'
|
|---|
| 424 | The table data. This array may be a one- or two-dimensional array,
|
|---|
| 425 | of type `int8', `int16', `int32', `struct yy_trans_info', or
|
|---|
| 426 | `struct yy_trans_info*', depending upon the values in the
|
|---|
| 427 | `td_flags', `td_lolen', and `td_hilen' fields.
|
|---|
| 428 |
|
|---|
| 429 | `td_pad64[]'
|
|---|
| 430 | Zero or more NULL bytes, padding the entire table to the next
|
|---|
| 431 | 64-bit boundary as calculated from the beginning of this table.
|
|---|
| 432 |
|
|---|
| 433 |
|
|---|
| 434 | File: flex.info, Node: Diagnostics, Next: Limitations, Prev: Serialized Tables, Up: Top
|
|---|
| 435 |
|
|---|
| 436 | Diagnostics
|
|---|
| 437 | ***********
|
|---|
| 438 |
|
|---|
| 439 | The following is a list of `flex' diagnostic messages:
|
|---|
| 440 |
|
|---|
| 441 | * `warning, rule cannot be matched' indicates that the given rule
|
|---|
| 442 | cannot be matched because it follows other rules that will always
|
|---|
| 443 | match the same text as it. For example, in the following `foo'
|
|---|
| 444 | cannot be matched because it comes after an identifier "catch-all"
|
|---|
| 445 | rule:
|
|---|
| 446 |
|
|---|
| 447 |
|
|---|
| 448 | [a-z]+ got_identifier();
|
|---|
| 449 | foo got_foo();
|
|---|
| 450 |
|
|---|
| 451 | Using `REJECT' in a scanner suppresses this warning.
|
|---|
| 452 |
|
|---|
| 453 | * `warning, -s option given but default rule can be matched' means
|
|---|
| 454 | that it is possible (perhaps only in a particular start condition)
|
|---|
| 455 | that the default rule (match any single character) is the only one
|
|---|
| 456 | that will match a particular input. Since `-s' was given,
|
|---|
| 457 | presumably this is not intended.
|
|---|
| 458 |
|
|---|
| 459 | * `reject_used_but_not_detected undefined' or
|
|---|
| 460 | `yymore_used_but_not_detected undefined'. These errors can occur
|
|---|
| 461 | at compile time. They indicate that the scanner uses `REJECT' or
|
|---|
| 462 | `yymore()' but that `flex' failed to notice the fact, meaning that
|
|---|
| 463 | `flex' scanned the first two sections looking for occurrences of
|
|---|
| 464 | these actions and failed to find any, but somehow you snuck some in
|
|---|
| 465 | (via a #include file, for example). Use `%option reject' or
|
|---|
| 466 | `%option yymore' to indicate to `flex' that you really do use
|
|---|
| 467 | these features.
|
|---|
| 468 |
|
|---|
| 469 | * `flex scanner jammed'. a scanner compiled with `-s' has
|
|---|
| 470 | encountered an input string which wasn't matched by any of its
|
|---|
| 471 | rules. This error can also occur due to internal problems.
|
|---|
| 472 |
|
|---|
| 473 | * `token too large, exceeds YYLMAX'. your scanner uses `%array' and
|
|---|
| 474 | one of its rules matched a string longer than the `YYLMAX'
|
|---|
| 475 | constant (8K bytes by default). You can increase the value by
|
|---|
| 476 | #define'ing `YYLMAX' in the definitions section of your `flex'
|
|---|
| 477 | input.
|
|---|
| 478 |
|
|---|
| 479 | * `scanner requires -8 flag to use the character 'x''. Your scanner
|
|---|
| 480 | specification includes recognizing the 8-bit character `'x'' and
|
|---|
| 481 | you did not specify the -8 flag, and your scanner defaulted to
|
|---|
| 482 | 7-bit because you used the `-Cf' or `-CF' table compression
|
|---|
| 483 | options. See the discussion of the `-7' flag, *Note Scanner
|
|---|
| 484 | Options::, for details.
|
|---|
| 485 |
|
|---|
| 486 | * `flex scanner push-back overflow'. you used `unput()' to push back
|
|---|
| 487 | so much text that the scanner's buffer could not hold both the
|
|---|
| 488 | pushed-back text and the current token in `yytext'. Ideally the
|
|---|
| 489 | scanner should dynamically resize the buffer in this case, but at
|
|---|
| 490 | present it does not.
|
|---|
| 491 |
|
|---|
| 492 | * `input buffer overflow, can't enlarge buffer because scanner uses
|
|---|
| 493 | REJECT'. the scanner was working on matching an extremely large
|
|---|
| 494 | token and needed to expand the input buffer. This doesn't work
|
|---|
| 495 | with scanners that use `REJECT'.
|
|---|
| 496 |
|
|---|
| 497 | * `fatal flex scanner internal error--end of buffer missed'. This can
|
|---|
| 498 | occur in a scanner which is reentered after a long-jump has jumped
|
|---|
| 499 | out (or over) the scanner's activation frame. Before reentering
|
|---|
| 500 | the scanner, use:
|
|---|
| 501 |
|
|---|
| 502 | yyrestart( yyin );
|
|---|
| 503 | or, as noted above, switch to using the C++ scanner class.
|
|---|
| 504 |
|
|---|
| 505 | * `too many start conditions in <> construct!' you listed more start
|
|---|
| 506 | conditions in a <> construct than exist (so you must have listed at
|
|---|
| 507 | least one of them twice).
|
|---|
| 508 |
|
|---|
| 509 |
|
|---|
| 510 | File: flex.info, Node: Limitations, Next: Bibliography, Prev: Diagnostics, Up: Top
|
|---|
| 511 |
|
|---|
| 512 | Limitations
|
|---|
| 513 | ***********
|
|---|
| 514 |
|
|---|
| 515 | Some trailing context patterns cannot be properly matched and
|
|---|
| 516 | generate warning messages (`dangerous trailing context'). These are
|
|---|
| 517 | patterns where the ending of the first part of the rule matches the
|
|---|
| 518 | beginning of the second part, such as `zx*/xy*', where the 'x*' matches
|
|---|
| 519 | the 'x' at the beginning of the trailing context. (Note that the POSIX
|
|---|
| 520 | draft states that the text matched by such patterns is undefined.) For
|
|---|
| 521 | some trailing context rules, parts which are actually fixed-length are
|
|---|
| 522 | not recognized as such, leading to the abovementioned performance loss.
|
|---|
| 523 | In particular, parts using `|' or `{n}' (such as `foo{3}') are always
|
|---|
| 524 | considered variable-length. Combining trailing context with the
|
|---|
| 525 | special `|' action can result in _fixed_ trailing context being turned
|
|---|
| 526 | into the more expensive _variable_ trailing context. For example, in
|
|---|
| 527 | the following:
|
|---|
| 528 |
|
|---|
| 529 |
|
|---|
| 530 | %%
|
|---|
| 531 | abc |
|
|---|
| 532 | xyz/def
|
|---|
| 533 |
|
|---|
| 534 | Use of `unput()' invalidates yytext and yyleng, unless the `%array'
|
|---|
| 535 | directive or the `-l' option has been used. Pattern-matching of `NUL's
|
|---|
| 536 | is substantially slower than matching other characters. Dynamic
|
|---|
| 537 | resizing of the input buffer is slow, as it entails rescanning all the
|
|---|
| 538 | text matched so far by the current (generally huge) token. Due to both
|
|---|
| 539 | buffering of input and read-ahead, you cannot intermix calls to
|
|---|
| 540 | `<stdio.h>' routines, such as, getchar(), with `flex' rules and expect
|
|---|
| 541 | it to work. Call `input()' instead. The total table entries listed by
|
|---|
| 542 | the `-v' flag excludes the number of table entries needed to determine
|
|---|
| 543 | what rule has been matched. The number of entries is equal to the
|
|---|
| 544 | number of DFA states if the scanner does not use `REJECT', and somewhat
|
|---|
| 545 | greater than the number of states if it does. `REJECT' cannot be used
|
|---|
| 546 | with the `-f' or `-F' options.
|
|---|
| 547 |
|
|---|
| 548 | The `flex' internal algorithms need documentation.
|
|---|
| 549 |
|
|---|
| 550 |
|
|---|
| 551 | File: flex.info, Node: Bibliography, Next: FAQ, Prev: Limitations, Up: Top
|
|---|
| 552 |
|
|---|
| 553 | Additional Reading
|
|---|
| 554 | ******************
|
|---|
| 555 |
|
|---|
| 556 | You may wish to read more about the following programs:
|
|---|
| 557 | * lex
|
|---|
| 558 |
|
|---|
| 559 | * yacc
|
|---|
| 560 |
|
|---|
| 561 | * sed
|
|---|
| 562 |
|
|---|
| 563 | * awk
|
|---|
| 564 |
|
|---|
| 565 | The following books may contain material of interest:
|
|---|
| 566 |
|
|---|
| 567 | John Levine, Tony Mason, and Doug Brown, _Lex & Yacc_, O'Reilly and
|
|---|
| 568 | Associates. Be sure to get the 2nd edition.
|
|---|
| 569 |
|
|---|
| 570 | M. E. Lesk and E. Schmidt, _LEX - Lexical Analyzer Generator_
|
|---|
| 571 |
|
|---|
| 572 | Alfred Aho, Ravi Sethi and Jeffrey Ullman, _Compilers: Principles,
|
|---|
| 573 | Techniques and Tools_, Addison-Wesley (1986). Describes the
|
|---|
| 574 | pattern-matching techniques used by `flex' (deterministic finite
|
|---|
| 575 | automata).
|
|---|
| 576 |
|
|---|
| 577 |
|
|---|
| 578 | File: flex.info, Node: FAQ, Next: Appendices, Prev: Bibliography, Up: Top
|
|---|
| 579 |
|
|---|
| 580 | FAQ
|
|---|
| 581 | ***
|
|---|
| 582 |
|
|---|
| 583 | From time to time, the `flex' maintainer receives certain questions.
|
|---|
| 584 | Rather than repeat answers to well-understood problems, we publish them
|
|---|
| 585 | here.
|
|---|
| 586 |
|
|---|
| 587 | * Menu:
|
|---|
| 588 |
|
|---|
| 589 | * When was flex born?::
|
|---|
| 590 | * How do I expand \ escape sequences in C-style quoted strings?::
|
|---|
| 591 | * Why do flex scanners call fileno if it is not ANSI compatible?::
|
|---|
| 592 | * Does flex support recursive pattern definitions?::
|
|---|
| 593 | * How do I skip huge chunks of input (tens of megabytes) while using flex?::
|
|---|
| 594 | * Flex is not matching my patterns in the same order that I defined them.::
|
|---|
| 595 | * My actions are executing out of order or sometimes not at all.::
|
|---|
| 596 | * How can I have multiple input sources feed into the same scanner at the same time?::
|
|---|
| 597 | * Can I build nested parsers that work with the same input file?::
|
|---|
| 598 | * How can I match text only at the end of a file?::
|
|---|
| 599 | * How can I make REJECT cascade across start condition boundaries?::
|
|---|
| 600 | * Why cant I use fast or full tables with interactive mode?::
|
|---|
| 601 | * How much faster is -F or -f than -C?::
|
|---|
| 602 | * If I have a simple grammar cant I just parse it with flex?::
|
|---|
| 603 | * Why doesnt yyrestart() set the start state back to INITIAL?::
|
|---|
| 604 | * How can I match C-style comments?::
|
|---|
| 605 | * The period isnt working the way I expected.::
|
|---|
| 606 | * Can I get the flex manual in another format?::
|
|---|
| 607 | * Does there exist a "faster" NDFA->DFA algorithm?::
|
|---|
| 608 | * How does flex compile the DFA so quickly?::
|
|---|
| 609 | * How can I use more than 8192 rules?::
|
|---|
| 610 | * How do I abandon a file in the middle of a scan and switch to a new file?::
|
|---|
| 611 | * How do I execute code only during initialization (only before the first scan)?::
|
|---|
| 612 | * How do I execute code at termination?::
|
|---|
| 613 | * Where else can I find help?::
|
|---|
| 614 | * Can I include comments in the "rules" section of the file?::
|
|---|
| 615 | * I get an error about undefined yywrap().::
|
|---|
| 616 | * How can I change the matching pattern at run time?::
|
|---|
| 617 | * How can I expand macros in the input?::
|
|---|
| 618 | * How can I build a two-pass scanner?::
|
|---|
| 619 | * How do I match any string not matched in the preceding rules?::
|
|---|
| 620 | * I am trying to port code from AT&T lex that uses yysptr and yysbuf.::
|
|---|
| 621 | * Is there a way to make flex treat NULL like a regular character?::
|
|---|
| 622 | * Whenever flex can not match the input it says "flex scanner jammed".::
|
|---|
| 623 | * Why doesnt flex have non-greedy operators like perl does?::
|
|---|
| 624 | * Memory leak - 16386 bytes allocated by malloc.::
|
|---|
| 625 | * How do I track the byte offset for lseek()?::
|
|---|
| 626 | * How do I use my own I/O classes in a C++ scanner?::
|
|---|
| 627 | * How do I skip as many chars as possible?::
|
|---|
| 628 | * deleteme00::
|
|---|
| 629 | * Are certain equivalent patterns faster than others?::
|
|---|
| 630 | * Is backing up a big deal?::
|
|---|
| 631 | * Can I fake multi-byte character support?::
|
|---|
| 632 | * deleteme01::
|
|---|
| 633 | * Can you discuss some flex internals?::
|
|---|
| 634 | * unput() messes up yy_at_bol::
|
|---|
| 635 | * The | operator is not doing what I want::
|
|---|
| 636 | * Why can't flex understand this variable trailing context pattern?::
|
|---|
| 637 | * The ^ operator isn't working::
|
|---|
| 638 | * Trailing context is getting confused with trailing optional patterns::
|
|---|
| 639 | * Is flex GNU or not?::
|
|---|
| 640 | * ERASEME53::
|
|---|
| 641 | * I need to scan if-then-else blocks and while loops::
|
|---|
| 642 | * ERASEME55::
|
|---|
| 643 | * ERASEME56::
|
|---|
| 644 | * ERASEME57::
|
|---|
| 645 | * Is there a repository for flex scanners?::
|
|---|
| 646 | * How can I conditionally compile or preprocess my flex input file?::
|
|---|
| 647 | * Where can I find grammars for lex and yacc?::
|
|---|
| 648 | * I get an end-of-buffer message for each character scanned.::
|
|---|
| 649 | * unnamed-faq-62::
|
|---|
| 650 | * unnamed-faq-63::
|
|---|
| 651 | * unnamed-faq-64::
|
|---|
| 652 | * unnamed-faq-65::
|
|---|
| 653 | * unnamed-faq-66::
|
|---|
| 654 | * unnamed-faq-67::
|
|---|
| 655 | * unnamed-faq-68::
|
|---|
| 656 | * unnamed-faq-69::
|
|---|
| 657 | * unnamed-faq-70::
|
|---|
| 658 | * unnamed-faq-71::
|
|---|
| 659 | * unnamed-faq-72::
|
|---|
| 660 | * unnamed-faq-73::
|
|---|
| 661 | * unnamed-faq-74::
|
|---|
| 662 | * unnamed-faq-75::
|
|---|
| 663 | * unnamed-faq-76::
|
|---|
| 664 | * unnamed-faq-77::
|
|---|
| 665 | * unnamed-faq-78::
|
|---|
| 666 | * unnamed-faq-79::
|
|---|
| 667 | * unnamed-faq-80::
|
|---|
| 668 | * unnamed-faq-81::
|
|---|
| 669 | * unnamed-faq-82::
|
|---|
| 670 | * unnamed-faq-83::
|
|---|
| 671 | * unnamed-faq-84::
|
|---|
| 672 | * unnamed-faq-85::
|
|---|
| 673 | * unnamed-faq-86::
|
|---|
| 674 | * unnamed-faq-87::
|
|---|
| 675 | * unnamed-faq-88::
|
|---|
| 676 | * unnamed-faq-90::
|
|---|
| 677 | * unnamed-faq-91::
|
|---|
| 678 | * unnamed-faq-92::
|
|---|
| 679 | * unnamed-faq-93::
|
|---|
| 680 | * unnamed-faq-94::
|
|---|
| 681 | * unnamed-faq-95::
|
|---|
| 682 | * unnamed-faq-96::
|
|---|
| 683 | * unnamed-faq-97::
|
|---|
| 684 | * unnamed-faq-98::
|
|---|
| 685 | * unnamed-faq-99::
|
|---|
| 686 | * unnamed-faq-100::
|
|---|
| 687 | * unnamed-faq-101::
|
|---|
| 688 | * What is the difference between YYLEX_PARAM and YY_DECL?::
|
|---|
| 689 | * Why do I get "conflicting types for yylex" error?::
|
|---|
| 690 | * How do I access the values set in a Flex action from within a Bison action?::
|
|---|
| 691 |
|
|---|
| 692 |
|
|---|
| 693 | File: flex.info, Node: When was flex born?, Next: How do I expand \ escape sequences in C-style quoted strings?, Up: FAQ
|
|---|
| 694 |
|
|---|
| 695 | When was flex born?
|
|---|
| 696 | ===================
|
|---|
| 697 |
|
|---|
| 698 | Vern Paxson took over the `Software Tools' lex project from Jef
|
|---|
| 699 | Poskanzer in 1982. At that point it was written in Ratfor. Around
|
|---|
| 700 | 1987 or so, Paxson translated it into C, and a legend was born :-).
|
|---|
| 701 |
|
|---|
| 702 |
|
|---|
| 703 | File: flex.info, Node: How do I expand \ escape sequences in C-style quoted strings?, Next: Why do flex scanners call fileno if it is not ANSI compatible?, Prev: When was flex born?, Up: FAQ
|
|---|
| 704 |
|
|---|
| 705 | How do I expand \ escape sequences in C-style quoted strings?
|
|---|
| 706 | =============================================================
|
|---|
| 707 |
|
|---|
| 708 | A key point when scanning quoted strings is that you cannot (easily)
|
|---|
| 709 | write a single rule that will precisely match the string if you allow
|
|---|
| 710 | things like embedded escape sequences and newlines. If you try to
|
|---|
| 711 | match strings with a single rule then you'll wind up having to rescan
|
|---|
| 712 | the string anyway to find any escape sequences.
|
|---|
| 713 |
|
|---|
| 714 | Instead you can use exclusive start conditions and a set of rules,
|
|---|
| 715 | one for matching non-escaped text, one for matching a single escape,
|
|---|
| 716 | one for matching an embedded newline, and one for recognizing the end
|
|---|
| 717 | of the string. Each of these rules is then faced with the question of
|
|---|
| 718 | where to put its intermediary results. The best solution is for the
|
|---|
| 719 | rules to append their local value of `yytext' to the end of a "string
|
|---|
| 720 | literal" buffer. A rule like the escape-matcher will append to the
|
|---|
| 721 | buffer the meaning of the escape sequence rather than the literal text
|
|---|
| 722 | in `yytext'. In this way, `yytext' does not need to be modified at all.
|
|---|
| 723 |
|
|---|
| 724 |
|
|---|
| 725 | File: flex.info, Node: Why do flex scanners call fileno if it is not ANSI compatible?, Next: Does flex support recursive pattern definitions?, Prev: How do I expand \ escape sequences in C-style quoted strings?, Up: FAQ
|
|---|
| 726 |
|
|---|
| 727 | Why do flex scanners call fileno if it is not ANSI compatible?
|
|---|
| 728 | ==============================================================
|
|---|
| 729 |
|
|---|
| 730 | Flex scanners call `fileno()' in order to get the file descriptor
|
|---|
| 731 | corresponding to `yyin'. The file descriptor may be passed to
|
|---|
| 732 | `isatty()' or `read()', depending upon which `%options' you specified.
|
|---|
| 733 | If your system does not have `fileno()' support, to get rid of the
|
|---|
| 734 | `read()' call, do not specify `%option read'. To get rid of the
|
|---|
| 735 | `isatty()' call, you must specify one of `%option always-interactive' or
|
|---|
| 736 | `%option never-interactive'.
|
|---|
| 737 |
|
|---|
| 738 |
|
|---|
| 739 | File: flex.info, Node: Does flex support recursive pattern definitions?, Next: How do I skip huge chunks of input (tens of megabytes) while using flex?, Prev: Why do flex scanners call fileno if it is not ANSI compatible?, Up: FAQ
|
|---|
| 740 |
|
|---|
| 741 | Does flex support recursive pattern definitions?
|
|---|
| 742 | ================================================
|
|---|
| 743 |
|
|---|
| 744 | e.g.,
|
|---|
| 745 |
|
|---|
| 746 |
|
|---|
| 747 | %%
|
|---|
| 748 | block "{"({block}|{statement})*"}"
|
|---|
| 749 |
|
|---|
| 750 | No. You cannot have recursive definitions. The pattern-matching
|
|---|
| 751 | power of regular expressions in general (and therefore flex scanners,
|
|---|
| 752 | too) is limited. In particular, regular expressions cannot "balance"
|
|---|
| 753 | parentheses to an arbitrary degree. For example, it's impossible to
|
|---|
| 754 | write a regular expression that matches all strings containing the same
|
|---|
| 755 | number of '{'s as '}'s. For more powerful pattern matching, you need a
|
|---|
| 756 | parser, such as `GNU bison'.
|
|---|
| 757 |
|
|---|
| 758 |
|
|---|
| 759 | File: flex.info, Node: How do I skip huge chunks of input (tens of megabytes) while using flex?, Next: Flex is not matching my patterns in the same order that I defined them., Prev: Does flex support recursive pattern definitions?, Up: FAQ
|
|---|
| 760 |
|
|---|
| 761 | How do I skip huge chunks of input (tens of megabytes) while using flex?
|
|---|
| 762 | ========================================================================
|
|---|
| 763 |
|
|---|
| 764 | Use `fseek()' (or `lseek()') to position yyin, then call
|
|---|
| 765 | `yyrestart()'.
|
|---|
| 766 |
|
|---|
| 767 |
|
|---|
| 768 | File: flex.info, Node: Flex is not matching my patterns in the same order that I defined them., Next: My actions are executing out of order or sometimes not at all., Prev: How do I skip huge chunks of input (tens of megabytes) while using flex?, Up: FAQ
|
|---|
| 769 |
|
|---|
| 770 | Flex is not matching my patterns in the same order that I defined them.
|
|---|
| 771 | =======================================================================
|
|---|
| 772 |
|
|---|
| 773 | `flex' picks the rule that matches the most text (i.e., the longest
|
|---|
| 774 | possible input string). This is because `flex' uses an entirely
|
|---|
| 775 | different matching technique ("deterministic finite automata") that
|
|---|
| 776 | actually does all of the matching simultaneously, in parallel. (Seems
|
|---|
| 777 | impossible, but it's actually a fairly simple technique once you
|
|---|
| 778 | understand the principles.)
|
|---|
| 779 |
|
|---|
| 780 | A side-effect of this parallel matching is that when the input
|
|---|
| 781 | matches more than one rule, `flex' scanners pick the rule that matched
|
|---|
| 782 | the _most_ text. This is explained further in the manual, in the
|
|---|
| 783 | section *Note Matching::.
|
|---|
| 784 |
|
|---|
| 785 | If you want `flex' to choose a shorter match, then you can work
|
|---|
| 786 | around this behavior by expanding your short rule to match more text,
|
|---|
| 787 | then put back the extra:
|
|---|
| 788 |
|
|---|
| 789 |
|
|---|
| 790 | data_.* yyless( 5 ); BEGIN BLOCKIDSTATE;
|
|---|
| 791 |
|
|---|
| 792 | Another fix would be to make the second rule active only during the
|
|---|
| 793 | `<BLOCKIDSTATE>' start condition, and make that start condition
|
|---|
| 794 | exclusive by declaring it with `%x' instead of `%s'.
|
|---|
| 795 |
|
|---|
| 796 | A final fix is to change the input language so that the ambiguity for
|
|---|
| 797 | `data_' is removed, by adding characters to it that don't match the
|
|---|
| 798 | identifier rule, or by removing characters (such as `_') from the
|
|---|
| 799 | identifier rule so it no longer matches `data_'. (Of course, you might
|
|---|
| 800 | also not have the option of changing the input language.)
|
|---|
| 801 |
|
|---|
| 802 |
|
|---|
| 803 | File: flex.info, Node: My actions are executing out of order or sometimes not at all., Next: How can I have multiple input sources feed into the same scanner at the same time?, Prev: Flex is not matching my patterns in the same order that I defined them., Up: FAQ
|
|---|
| 804 |
|
|---|
| 805 | My actions are executing out of order or sometimes not at all.
|
|---|
| 806 | ==============================================================
|
|---|
| 807 |
|
|---|
| 808 | Most likely, you have (in error) placed the opening `{' of the action
|
|---|
| 809 | block on a different line than the rule, e.g.,
|
|---|
| 810 |
|
|---|
| 811 |
|
|---|
| 812 | ^(foo|bar)
|
|---|
| 813 | { <<<--- WRONG!
|
|---|
| 814 |
|
|---|
| 815 | }
|
|---|
| 816 |
|
|---|
| 817 | `flex' requires that the opening `{' of an action associated with a
|
|---|
| 818 | rule begin on the same line as does the rule. You need instead to
|
|---|
| 819 | write your rules as follows:
|
|---|
| 820 |
|
|---|
| 821 |
|
|---|
| 822 | ^(foo|bar) { // CORRECT!
|
|---|
| 823 |
|
|---|
| 824 | }
|
|---|
| 825 |
|
|---|
| 826 |
|
|---|
| 827 | File: flex.info, Node: How can I have multiple input sources feed into the same scanner at the same time?, Next: Can I build nested parsers that work with the same input file?, Prev: My actions are executing out of order or sometimes not at all., Up: FAQ
|
|---|
| 828 |
|
|---|
| 829 | How can I have multiple input sources feed into the same scanner at the same time?
|
|---|
| 830 | ==================================================================================
|
|---|
| 831 |
|
|---|
| 832 | If ...
|
|---|
| 833 | * your scanner is free of backtracking (verified using `flex''s `-b'
|
|---|
| 834 | flag),
|
|---|
| 835 |
|
|---|
| 836 | * AND you run your scanner interactively (`-I' option; default
|
|---|
| 837 | unless using special table compression options),
|
|---|
| 838 |
|
|---|
| 839 | * AND you feed it one character at a time by redefining `YY_INPUT'
|
|---|
| 840 | to do so,
|
|---|
| 841 |
|
|---|
| 842 | then every time it matches a token, it will have exhausted its input
|
|---|
| 843 | buffer (because the scanner is free of backtracking). This means you
|
|---|
| 844 | can safely use `select()' at the point and only call `yylex()' for
|
|---|
| 845 | another token if `select()' indicates there's data available.
|
|---|
| 846 |
|
|---|
| 847 | That is, move the `select()' out from the input function to a point
|
|---|
| 848 | where it determines whether `yylex()' gets called for the next token.
|
|---|
| 849 |
|
|---|
| 850 | With this approach, you will still have problems if your input can
|
|---|
| 851 | arrive piecemeal; `select()' could inform you that the beginning of a
|
|---|
| 852 | token is available, you call `yylex()' to get it, but it winds up
|
|---|
| 853 | blocking waiting for the later characters in the token.
|
|---|
| 854 |
|
|---|
| 855 | Here's another way: Move your input multiplexing inside of
|
|---|
| 856 | `YY_INPUT'. That is, whenever `YY_INPUT' is called, it `select()''s to
|
|---|
| 857 | see where input is available. If input is available for the scanner,
|
|---|
| 858 | it reads and returns the next byte. If input is available from another
|
|---|
| 859 | source, it calls whatever function is responsible for reading from that
|
|---|
| 860 | source. (If no input is available, it blocks until some input is
|
|---|
| 861 | available.) I've used this technique in an interpreter I wrote that
|
|---|
| 862 | both reads keyboard input using a `flex' scanner and IPC traffic from
|
|---|
| 863 | sockets, and it works fine.
|
|---|
| 864 |
|
|---|
| 865 |
|
|---|
| 866 | File: flex.info, Node: Can I build nested parsers that work with the same input file?, Next: How can I match text only at the end of a file?, Prev: How can I have multiple input sources feed into the same scanner at the same time?, Up: FAQ
|
|---|
| 867 |
|
|---|
| 868 | Can I build nested parsers that work with the same input file?
|
|---|
| 869 | ==============================================================
|
|---|
| 870 |
|
|---|
| 871 | This is not going to work without some additional effort. The
|
|---|
| 872 | reason is that `flex' block-buffers the input it reads from `yyin'.
|
|---|
| 873 | This means that the "outermost" `yylex()', when called, will
|
|---|
| 874 | automatically slurp up the first 8K of input available on yyin, and
|
|---|
| 875 | subsequent calls to other `yylex()''s won't see that input. You might
|
|---|
| 876 | be tempted to work around this problem by redefining `YY_INPUT' to only
|
|---|
| 877 | return a small amount of text, but it turns out that that approach is
|
|---|
| 878 | quite difficult. Instead, the best solution is to combine all of your
|
|---|
| 879 | scanners into one large scanner, using a different exclusive start
|
|---|
| 880 | condition for each.
|
|---|
| 881 |
|
|---|
| 882 |
|
|---|
| 883 | File: flex.info, Node: How can I match text only at the end of a file?, Next: How can I make REJECT cascade across start condition boundaries?, Prev: Can I build nested parsers that work with the same input file?, Up: FAQ
|
|---|
| 884 |
|
|---|
| 885 | How can I match text only at the end of a file?
|
|---|
| 886 | ===============================================
|
|---|
| 887 |
|
|---|
| 888 | There is no way to write a rule which is "match this text, but only
|
|---|
| 889 | if it comes at the end of the file". You can fake it, though, if you
|
|---|
| 890 | happen to have a character lying around that you don't allow in your
|
|---|
| 891 | input. Then you redefine `YY_INPUT' to call your own routine which, if
|
|---|
| 892 | it sees an `EOF', returns the magic character first (and remembers to
|
|---|
| 893 | return a real `EOF' next time it's called). Then you could write:
|
|---|
| 894 |
|
|---|
| 895 |
|
|---|
| 896 | <COMMENT>(.|\n)*{EOF_CHAR} /* saw comment at EOF */
|
|---|
| 897 |
|
|---|
| 898 |
|
|---|
| 899 | File: flex.info, Node: How can I make REJECT cascade across start condition boundaries?, Next: Why cant I use fast or full tables with interactive mode?, Prev: How can I match text only at the end of a file?, Up: FAQ
|
|---|
| 900 |
|
|---|
| 901 | How can I make REJECT cascade across start condition boundaries?
|
|---|
| 902 | ================================================================
|
|---|
| 903 |
|
|---|
| 904 | You can do this as follows. Suppose you have a start condition `A',
|
|---|
| 905 | and after exhausting all of the possible matches in `<A>', you want to
|
|---|
| 906 | try matches in `<INITIAL>'. Then you could use the following:
|
|---|
| 907 |
|
|---|
| 908 |
|
|---|
| 909 | %x A
|
|---|
| 910 | %%
|
|---|
| 911 | <A>rule_that_is_long ...; REJECT;
|
|---|
| 912 | <A>rule ...; REJECT; /* shorter rule */
|
|---|
| 913 | <A>etc.
|
|---|
| 914 | ...
|
|---|
| 915 | <A>.|\n {
|
|---|
| 916 | /* Shortest and last rule in <A>, so
|
|---|
| 917 | * cascaded REJECT's will eventually
|
|---|
| 918 | * wind up matching this rule. We want
|
|---|
| 919 | * to now switch to the initial state
|
|---|
| 920 | * and try matching from there instead.
|
|---|
| 921 | */
|
|---|
| 922 | yyless(0); /* put back matched text */
|
|---|
| 923 | BEGIN(INITIAL);
|
|---|
| 924 | }
|
|---|
| 925 |
|
|---|
| 926 |
|
|---|
| 927 | File: flex.info, Node: Why cant I use fast or full tables with interactive mode?, Next: How much faster is -F or -f than -C?, Prev: How can I make REJECT cascade across start condition boundaries?, Up: FAQ
|
|---|
| 928 |
|
|---|
| 929 | Why can't I use fast or full tables with interactive mode?
|
|---|
| 930 | ==========================================================
|
|---|
| 931 |
|
|---|
| 932 | One of the assumptions flex makes is that interactive applications
|
|---|
| 933 | are inherently slow (they're waiting on a human after all). It has to
|
|---|
| 934 | do with how the scanner detects that it must be finished scanning a
|
|---|
| 935 | token. For interactive scanners, after scanning each character the
|
|---|
| 936 | current state is looked up in a table (essentially) to see whether
|
|---|
| 937 | there's a chance of another input character possibly extending the
|
|---|
| 938 | length of the match. If not, the scanner halts. For non-interactive
|
|---|
| 939 | scanners, the end-of-token test is much simpler, basically a compare
|
|---|
| 940 | with 0, so no memory bus cycles. Since the test occurs in the
|
|---|
| 941 | innermost scanning loop, one would like to make it go as fast as
|
|---|
| 942 | possible.
|
|---|
| 943 |
|
|---|
| 944 | Still, it seems reasonable to allow the user to choose to trade off
|
|---|
| 945 | a bit of performance in this area to gain the corresponding
|
|---|
| 946 | flexibility. There might be another reason, though, why fast scanners
|
|---|
| 947 | don't support the interactive option.
|
|---|
| 948 |
|
|---|
| 949 |
|
|---|
| 950 | File: flex.info, Node: How much faster is -F or -f than -C?, Next: If I have a simple grammar cant I just parse it with flex?, Prev: Why cant I use fast or full tables with interactive mode?, Up: FAQ
|
|---|
| 951 |
|
|---|
| 952 | How much faster is -F or -f than -C?
|
|---|
| 953 | ====================================
|
|---|
| 954 |
|
|---|
| 955 | Much faster (factor of 2-3).
|
|---|
| 956 |
|
|---|
| 957 |
|
|---|
| 958 | File: flex.info, Node: If I have a simple grammar cant I just parse it with flex?, Next: Why doesnt yyrestart() set the start state back to INITIAL?, Prev: How much faster is -F or -f than -C?, Up: FAQ
|
|---|
| 959 |
|
|---|
| 960 | If I have a simple grammar can't I just parse it with flex?
|
|---|
| 961 | ===========================================================
|
|---|
| 962 |
|
|---|
| 963 | Is your grammar recursive? That's almost always a sign that you're
|
|---|
| 964 | better off using a parser/scanner rather than just trying to use a
|
|---|
| 965 | scanner alone.
|
|---|
| 966 |
|
|---|
| 967 |
|
|---|
| 968 | File: flex.info, Node: Why doesnt yyrestart() set the start state back to INITIAL?, Next: How can I match C-style comments?, Prev: If I have a simple grammar cant I just parse it with flex?, Up: FAQ
|
|---|
| 969 |
|
|---|
| 970 | Why doesn't yyrestart() set the start state back to INITIAL?
|
|---|
| 971 | ============================================================
|
|---|
| 972 |
|
|---|
| 973 | There are two reasons. The first is that there might be programs
|
|---|
| 974 | that rely on the start state not changing across file changes. The
|
|---|
| 975 | second is that beginning with `flex' version 2.4, use of `yyrestart()'
|
|---|
| 976 | is no longer required, so fixing the problem there doesn't solve the
|
|---|
| 977 | more general problem.
|
|---|
| 978 |
|
|---|
| 979 |
|
|---|
| 980 | File: flex.info, Node: How can I match C-style comments?, Next: The period isnt working the way I expected., Prev: Why doesnt yyrestart() set the start state back to INITIAL?, Up: FAQ
|
|---|
| 981 |
|
|---|
| 982 | How can I match C-style comments?
|
|---|
| 983 | =================================
|
|---|
| 984 |
|
|---|
| 985 | You might be tempted to try something like this:
|
|---|
| 986 |
|
|---|
| 987 |
|
|---|
| 988 | "/*".*"*/" // WRONG!
|
|---|
| 989 |
|
|---|
| 990 | or, worse, this:
|
|---|
| 991 |
|
|---|
| 992 |
|
|---|
| 993 | "/*"(.|\n)"*/" // WRONG!
|
|---|
| 994 |
|
|---|
| 995 | The above rules will eat too much input, and blow up on things like:
|
|---|
| 996 |
|
|---|
| 997 |
|
|---|
| 998 | /* a comment */ do_my_thing( "oops */" );
|
|---|
| 999 |
|
|---|
| 1000 | Here is one way which allows you to track line information:
|
|---|
| 1001 |
|
|---|
| 1002 |
|
|---|
| 1003 | <INITIAL>{
|
|---|
| 1004 | "/*" BEGIN(IN_COMMENT);
|
|---|
| 1005 | }
|
|---|
| 1006 | <IN_COMMENT>{
|
|---|
| 1007 | "*/" BEGIN(INITIAL);
|
|---|
| 1008 | [^*\n]+ // eat comment in chunks
|
|---|
| 1009 | "*" // eat the lone star
|
|---|
| 1010 | \n yylineno++;
|
|---|
| 1011 | }
|
|---|
| 1012 |
|
|---|
| 1013 |
|
|---|
| 1014 | File: flex.info, Node: The period isnt working the way I expected., Next: Can I get the flex manual in another format?, Prev: How can I match C-style comments?, Up: FAQ
|
|---|
| 1015 |
|
|---|
| 1016 | The '.' isn't working the way I expected.
|
|---|
| 1017 | =========================================
|
|---|
| 1018 |
|
|---|
| 1019 | Here are some tips for using `.':
|
|---|
| 1020 |
|
|---|
| 1021 | * A common mistake is to place the grouping parenthesis AFTER an
|
|---|
| 1022 | operator, when you really meant to place the parenthesis BEFORE
|
|---|
| 1023 | the operator, e.g., you probably want this `(foo|bar)+' and NOT
|
|---|
| 1024 | this `(foo|bar+)'.
|
|---|
| 1025 |
|
|---|
| 1026 | The first pattern matches the words `foo' or `bar' any number of
|
|---|
| 1027 | times, e.g., it matches the text `barfoofoobarfoo'. The second
|
|---|
| 1028 | pattern matches a single instance of `foo' or a single instance of
|
|---|
| 1029 | `bar' followed by one or more `r's, e.g., it matches the text
|
|---|
| 1030 | `barrrr' .
|
|---|
| 1031 |
|
|---|
| 1032 | * A `.' inside `[]''s just means a literal`.' (period), and NOT "any
|
|---|
| 1033 | character except newline".
|
|---|
| 1034 |
|
|---|
| 1035 | * Remember that `.' matches any character EXCEPT `\n' (and `EOF').
|
|---|
| 1036 | If you really want to match ANY character, including newlines,
|
|---|
| 1037 | then use `(.|\n)' Beware that the regex `(.|\n)+' will match your
|
|---|
| 1038 | entire input!
|
|---|
| 1039 |
|
|---|
| 1040 | * Finally, if you want to match a literal `.' (a period), then use
|
|---|
| 1041 | `[.]' or `"."'
|
|---|
| 1042 |
|
|---|
| 1043 |
|
|---|
| 1044 | File: flex.info, Node: Can I get the flex manual in another format?, Next: Does there exist a "faster" NDFA->DFA algorithm?, Prev: The period isnt working the way I expected., Up: FAQ
|
|---|
| 1045 |
|
|---|
| 1046 | Can I get the flex manual in another format?
|
|---|
| 1047 | ============================================
|
|---|
| 1048 |
|
|---|
| 1049 | The `flex' source distribution includes a texinfo manual. You are
|
|---|
| 1050 | free to convert that texinfo into whatever format you desire. The
|
|---|
| 1051 | `texinfo' package includes tools for conversion to a number of formats.
|
|---|
| 1052 |
|
|---|
| 1053 |
|
|---|
| 1054 | File: flex.info, Node: Does there exist a "faster" NDFA->DFA algorithm?, Next: How does flex compile the DFA so quickly?, Prev: Can I get the flex manual in another format?, Up: FAQ
|
|---|
| 1055 |
|
|---|
| 1056 | Does there exist a "faster" NDFA->DFA algorithm?
|
|---|
| 1057 | ================================================
|
|---|
| 1058 |
|
|---|
| 1059 | There's no way around the potential exponential running time - it
|
|---|
| 1060 | can take you exponential time just to enumerate all of the DFA states.
|
|---|
| 1061 | In practice, though, the running time is closer to linear, or sometimes
|
|---|
| 1062 | quadratic.
|
|---|
| 1063 |
|
|---|
| 1064 |
|
|---|
| 1065 | File: flex.info, Node: How does flex compile the DFA so quickly?, Next: How can I use more than 8192 rules?, Prev: Does there exist a "faster" NDFA->DFA algorithm?, Up: FAQ
|
|---|
| 1066 |
|
|---|
| 1067 | How does flex compile the DFA so quickly?
|
|---|
| 1068 | =========================================
|
|---|
| 1069 |
|
|---|
| 1070 | There are two big speed wins that `flex' uses:
|
|---|
| 1071 |
|
|---|
| 1072 | 1. It analyzes the input rules to construct equivalence classes for
|
|---|
| 1073 | those characters that always make the same transitions. It then
|
|---|
| 1074 | rewrites the NFA using equivalence classes for transitions instead
|
|---|
| 1075 | of characters. This cuts down the NFA->DFA computation time
|
|---|
| 1076 | dramatically, to the point where, for uncompressed DFA tables, the
|
|---|
| 1077 | DFA generation is often I/O bound in writing out the tables.
|
|---|
| 1078 |
|
|---|
| 1079 | 2. It maintains hash values for previously computed DFA states, so
|
|---|
| 1080 | testing whether a newly constructed DFA state is equivalent to a
|
|---|
| 1081 | previously constructed state can be done very quickly, by first
|
|---|
| 1082 | comparing hash values.
|
|---|
| 1083 |
|
|---|
| 1084 |
|
|---|
| 1085 | File: flex.info, Node: How can I use more than 8192 rules?, Next: How do I abandon a file in the middle of a scan and switch to a new file?, Prev: How does flex compile the DFA so quickly?, Up: FAQ
|
|---|
| 1086 |
|
|---|
| 1087 | How can I use more than 8192 rules?
|
|---|
| 1088 | ===================================
|
|---|
| 1089 |
|
|---|
| 1090 | `Flex' is compiled with an upper limit of 8192 rules per scanner.
|
|---|
| 1091 | If you need more than 8192 rules in your scanner, you'll have to
|
|---|
| 1092 | recompile `flex' with the following changes in `flexdef.h':
|
|---|
| 1093 |
|
|---|
| 1094 |
|
|---|
| 1095 | < #define YY_TRAILING_MASK 0x2000
|
|---|
| 1096 | < #define YY_TRAILING_HEAD_MASK 0x4000
|
|---|
| 1097 | --
|
|---|
| 1098 | > #define YY_TRAILING_MASK 0x20000000
|
|---|
| 1099 | > #define YY_TRAILING_HEAD_MASK 0x40000000
|
|---|
| 1100 |
|
|---|
| 1101 | This should work okay as long as your C compiler uses 32 bit
|
|---|
| 1102 | integers. But you might want to think about whether using such a huge
|
|---|
| 1103 | number of rules is the best way to solve your problem.
|
|---|
| 1104 |
|
|---|
| 1105 | The following may also be relevant:
|
|---|
| 1106 |
|
|---|
| 1107 | With luck, you should be able to increase the definitions in
|
|---|
| 1108 | flexdef.h for:
|
|---|
| 1109 |
|
|---|
| 1110 |
|
|---|
| 1111 | #define JAMSTATE -32766 /* marks a reference to the state that always jams */
|
|---|
| 1112 | #define MAXIMUM_MNS 31999
|
|---|
| 1113 | #define BAD_SUBSCRIPT -32767
|
|---|
| 1114 |
|
|---|
| 1115 | recompile everything, and it'll all work. Flex only has these
|
|---|
| 1116 | 16-bit-like values built into it because a long time ago it was
|
|---|
| 1117 | developed on a machine with 16-bit ints. I've given this advice to
|
|---|
| 1118 | others in the past but haven't heard back from them whether it worked
|
|---|
| 1119 | okay or not...
|
|---|
| 1120 |
|
|---|
| 1121 |
|
|---|
| 1122 | File: flex.info, Node: How do I abandon a file in the middle of a scan and switch to a new file?, Next: How do I execute code only during initialization (only before the first scan)?, Prev: How can I use more than 8192 rules?, Up: FAQ
|
|---|
| 1123 |
|
|---|
| 1124 | How do I abandon a file in the middle of a scan and switch to a new file?
|
|---|
| 1125 | =========================================================================
|
|---|
| 1126 |
|
|---|
| 1127 | Just call `yyrestart(newfile)'. Be sure to reset the start state if
|
|---|
| 1128 | you want a "fresh start, since `yyrestart' does NOT reset the start
|
|---|
| 1129 | state back to `INITIAL'.
|
|---|
| 1130 |
|
|---|
| 1131 |
|
|---|
| 1132 | File: flex.info, Node: How do I execute code only during initialization (only before the first scan)?, Next: How do I execute code at termination?, Prev: How do I abandon a file in the middle of a scan and switch to a new file?, Up: FAQ
|
|---|
| 1133 |
|
|---|
| 1134 | How do I execute code only during initialization (only before the first scan)?
|
|---|
| 1135 | ==============================================================================
|
|---|
| 1136 |
|
|---|
| 1137 | You can specify an initial action by defining the macro
|
|---|
| 1138 | `YY_USER_INIT' (though note that `yyout' may not be available at the
|
|---|
| 1139 | time this macro is executed). Or you can add to the beginning of your
|
|---|
| 1140 | rules section:
|
|---|
| 1141 |
|
|---|
| 1142 |
|
|---|
| 1143 | %%
|
|---|
| 1144 | /* Must be indented! */
|
|---|
| 1145 | static int did_init = 0;
|
|---|
| 1146 |
|
|---|
| 1147 | if ( ! did_init ){
|
|---|
| 1148 | do_my_init();
|
|---|
| 1149 | did_init = 1;
|
|---|
| 1150 | }
|
|---|
| 1151 |
|
|---|
| 1152 |
|
|---|
| 1153 | File: flex.info, Node: How do I execute code at termination?, Next: Where else can I find help?, Prev: How do I execute code only during initialization (only before the first scan)?, Up: FAQ
|
|---|
| 1154 |
|
|---|
| 1155 | How do I execute code at termination?
|
|---|
| 1156 | =====================================
|
|---|
| 1157 |
|
|---|
| 1158 | You can specify an action for the `<<EOF>>' rule.
|
|---|
| 1159 |
|
|---|
| 1160 |
|
|---|
| 1161 | File: flex.info, Node: Where else can I find help?, Next: Can I include comments in the "rules" section of the file?, Prev: How do I execute code at termination?, Up: FAQ
|
|---|
| 1162 |
|
|---|
| 1163 | Where else can I find help?
|
|---|
| 1164 | ===========================
|
|---|
| 1165 |
|
|---|
| 1166 | You can find the flex homepage on the web at
|
|---|
| 1167 | `http://flex.sourceforge.net/'. See that page for details about flex
|
|---|
| 1168 | mailing lists as well.
|
|---|
| 1169 |
|
|---|
| 1170 |
|
|---|
| 1171 | File: flex.info, Node: Can I include comments in the "rules" section of the file?, Next: I get an error about undefined yywrap()., Prev: Where else can I find help?, Up: FAQ
|
|---|
| 1172 |
|
|---|
| 1173 | Can I include comments in the "rules" section of the file?
|
|---|
| 1174 | ==========================================================
|
|---|
| 1175 |
|
|---|
| 1176 | Yes, just about anywhere you want to. See the manual for the
|
|---|
| 1177 | specific syntax.
|
|---|
| 1178 |
|
|---|
| 1179 |
|
|---|
| 1180 | File: flex.info, Node: I get an error about undefined yywrap()., Next: How can I change the matching pattern at run time?, Prev: Can I include comments in the "rules" section of the file?, Up: FAQ
|
|---|
| 1181 |
|
|---|
| 1182 | I get an error about undefined yywrap().
|
|---|
| 1183 | ========================================
|
|---|
| 1184 |
|
|---|
| 1185 | You must supply a `yywrap()' function of your own, or link to
|
|---|
| 1186 | `libfl.a' (which provides one), or use
|
|---|
| 1187 |
|
|---|
| 1188 |
|
|---|
| 1189 | %option noyywrap
|
|---|
| 1190 |
|
|---|
| 1191 | in your source to say you don't want a `yywrap()' function.
|
|---|
| 1192 |
|
|---|
| 1193 |
|
|---|
| 1194 | File: flex.info, Node: How can I change the matching pattern at run time?, Next: How can I expand macros in the input?, Prev: I get an error about undefined yywrap()., Up: FAQ
|
|---|
| 1195 |
|
|---|
| 1196 | How can I change the matching pattern at run time?
|
|---|
| 1197 | ==================================================
|
|---|
| 1198 |
|
|---|
| 1199 | You can't, it's compiled into a static table when flex builds the
|
|---|
| 1200 | scanner.
|
|---|
| 1201 |
|
|---|
| 1202 |
|
|---|
| 1203 | File: flex.info, Node: How can I expand macros in the input?, Next: How can I build a two-pass scanner?, Prev: How can I change the matching pattern at run time?, Up: FAQ
|
|---|
| 1204 |
|
|---|
| 1205 | How can I expand macros in the input?
|
|---|
| 1206 | =====================================
|
|---|
| 1207 |
|
|---|
| 1208 | The best way to approach this problem is at a higher level, e.g., in
|
|---|
| 1209 | the parser.
|
|---|
| 1210 |
|
|---|
| 1211 | However, you can do this using multiple input buffers.
|
|---|
| 1212 |
|
|---|
| 1213 |
|
|---|
| 1214 | %%
|
|---|
| 1215 | macro/[a-z]+ {
|
|---|
| 1216 | /* Saw the macro "macro" followed by extra stuff. */
|
|---|
| 1217 | main_buffer = YY_CURRENT_BUFFER;
|
|---|
| 1218 | expansion_buffer = yy_scan_string(expand(yytext));
|
|---|
| 1219 | yy_switch_to_buffer(expansion_buffer);
|
|---|
| 1220 | }
|
|---|
| 1221 |
|
|---|
| 1222 | <<EOF>> {
|
|---|
| 1223 | if ( expansion_buffer )
|
|---|
| 1224 | {
|
|---|
| 1225 | // We were doing an expansion, return to where
|
|---|
| 1226 | // we were.
|
|---|
| 1227 | yy_switch_to_buffer(main_buffer);
|
|---|
| 1228 | yy_delete_buffer(expansion_buffer);
|
|---|
| 1229 | expansion_buffer = 0;
|
|---|
| 1230 | }
|
|---|
| 1231 | else
|
|---|
| 1232 | yyterminate();
|
|---|
| 1233 | }
|
|---|
| 1234 |
|
|---|
| 1235 | You probably will want a stack of expansion buffers to allow nested
|
|---|
| 1236 | macros. From the above though hopefully the idea is clear.
|
|---|
| 1237 |
|
|---|
| 1238 |
|
|---|
| 1239 | File: flex.info, Node: How can I build a two-pass scanner?, Next: How do I match any string not matched in the preceding rules?, Prev: How can I expand macros in the input?, Up: FAQ
|
|---|
| 1240 |
|
|---|
| 1241 | How can I build a two-pass scanner?
|
|---|
| 1242 | ===================================
|
|---|
| 1243 |
|
|---|
| 1244 | One way to do it is to filter the first pass to a temporary file,
|
|---|
| 1245 | then process the temporary file on the second pass. You will probably
|
|---|
| 1246 | see a performance hit, do to all the disk I/O.
|
|---|
| 1247 |
|
|---|
| 1248 | When you need to look ahead far forward like this, it almost always
|
|---|
| 1249 | means that the right solution is to build a parse tree of the entire
|
|---|
| 1250 | input, then walk it after the parse in order to generate the output.
|
|---|
| 1251 | In a sense, this is a two-pass approach, once through the text and once
|
|---|
| 1252 | through the parse tree, but the performance hit for the latter is
|
|---|
| 1253 | usually an order of magnitude smaller, since everything is already
|
|---|
| 1254 | classified, in binary format, and residing in memory.
|
|---|
| 1255 |
|
|---|