Context Navigation

flex.info-4

Visit:

Last change on this file was 3031, checked in by bird, 19 years ago
flex 2.5.33.
File size: 50.2 KB

Line
1	This is flex.info, produced by makeinfo version 4.5 from flex.texi.
2
3	INFO-DIR-SECTION Programming
4	START-INFO-DIR-ENTRY
5	* flex: (flex). Fast lexical analyzer generator (lex replacement).
6	END-INFO-DIR-ENTRY
7
8
9	The flex manual is placed under the same licensing conditions as the
10	rest of flex:
11
12	Copyright (C) 1990, 1997 The Regents of the University of California.
13	All rights reserved.
14
15	This code is derived from software contributed to Berkeley by Vern
16	Paxson.
17
18	The United States Government has rights in this work pursuant to
19	contract no. DE-AC03-76SF00098 between the United States Department of
20	Energy and the University of California.
21
22	Redistribution and use in source and binary forms, with or without
23	modification, are permitted provided that the following conditions are
24	met:
25
26	1. Redistributions of source code must retain the above copyright
27	notice, this list of conditions and the following disclaimer.
28
29	2. Redistributions in binary form must reproduce the above copyright
30	notice, this list of conditions and the following disclaimer in the
31	documentation and/or other materials provided with the
32	distribution.
33	Neither the name of the University nor the names of its contributors
34	may be used to endorse or promote products derived from this software
35	without specific prior written permission.
36
37	THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
38	WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
39	MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
40
41	File: flex.info, Node: Overriding The Default Memory Management, Next: A Note About yytext And Memory, Prev: The Default Memory Management, Up: Memory Management
42
43	Overriding The Default Memory Management
44	========================================
45
46	Flex calls the functions `yyalloc', `yyrealloc', and `yyfree' when
47	it needs to allocate or free memory. By default, these functions are
48	wrappers around the standard C functions, `malloc', `realloc', and
49	`free', respectively. You can override the default implementations by
50	telling flex that you will provide your own implementations.
51
52	To override the default implementations, you must do two things:
53
54	1. Suppress the default implementations by specifying one or more of
55	the following options:
56
57	* `%option noyyalloc'
58
59	* `%option noyyrealloc'
60
61	* `%option noyyfree'.
62
63	2. Provide your own implementation of the following functions: (1)
64
65
66	// For a non-reentrant scanner
67	void * yyalloc (size_t bytes);
68	void * yyrealloc (void * ptr, size_t bytes);
69	void yyfree (void * ptr);
70
71	// For a reentrant scanner
72	void * yyalloc (size_t bytes, void * yyscanner);
73	void * yyrealloc (void * ptr, size_t bytes, void * yyscanner);
74	void yyfree (void * ptr, void * yyscanner);
75
76
77	In the following example, we will override all three memory
78	routines. We assume that there is a custom allocator with garbage
79	collection. In order to make this example interesting, we will use a
80	reentrant scanner, passing a pointer to the custom allocator through
81	`yyextra'.
82
83
84	%{
85	#include "some_allocator.h"
86	%}
87
88	/* Suppress the default implementations. */
89	%option noyyalloc noyyrealloc noyyfree
90	%option reentrant
91
92	/* Initialize the allocator. */
93	#define YY_EXTRA_TYPE struct allocator*
94	#define YY_USER_INIT yyextra = allocator_create();
95
96	%%
97	.\|\n ;
98	%%
99
100	/* Provide our own implementations. */
101	void * yyalloc (size_t bytes, void* yyscanner) {
102	return allocator_alloc (yyextra, bytes);
103	}
104
105	void * yyrealloc (void * ptr, size_t bytes, void* yyscanner) {
106	return allocator_realloc (yyextra, bytes);
107	}
108
109	void yyfree (void * ptr, void * yyscanner) {
110	/* Do nothing -- we leave it to the garbage collector. */
111	}
112
113	---------- Footnotes ----------
114
115	(1) It is not necessary to override all (or any) of the memory
116	management routines. You may, for example, override `yyrealloc', but
117	not `yyfree' or `yyalloc'.
118
119
120	File: flex.info, Node: A Note About yytext And Memory, Prev: Overriding The Default Memory Management, Up: Memory Management
121
122	A Note About yytext And Memory
123	==============================
124
125	When flex finds a match, `yytext' points to the first character of
126	the match in the input buffer. The string itself is part of the input
127	buffer, and is _NOT_ allocated separately. The value of yytext will be
128	overwritten the next time yylex() is called. In short, the value of
129	yytext is only valid from within the matched rule's action.
130
131	Often, you want the value of yytext to persist for later processing,
132	i.e., by a parser with non-zero lookahead. In order to preserve yytext,
133	you will have to copy it with strdup() or a similar function. But this
134	introduces some headache because your parser is now responsible for
135	freeing the copy of yytext. If you use a yacc or bison parser,
136	(commonly used with flex), you will discover that the error recovery
137	mechanisms can cause memory to be leaked.
138
139	To prevent memory leaks from strdup'd yytext, you will have to track
140	the memory somehow. Our experience has shown that a garbage collection
141	mechanism or a pooled memory mechanism will save you a lot of grief
142	when writing parsers.
143
144
145	File: flex.info, Node: Serialized Tables, Next: Diagnostics, Prev: Memory Management, Up: Top
146
147	Serialized Tables
148	*****************
149
150	A `flex' scanner has the ability to save the DFA tables to a file,
151	and load them at runtime when needed. The motivation for this feature
152	is to reduce the runtime memory footprint. Traditionally, these tables
153	have been compiled into the scanner as C arrays, and are sometimes
154	quite large. Since the tables are compiled into the scanner, the
155	memory used by the tables can never be freed. This is a waste of
156	memory, especially if an application uses several scanners, but none of
157	them at the same time.
158
159	The serialization feature allows the tables to be loaded at runtime,
160	before scanning begins. The tables may be discarded when scanning is
161	finished.
162
163	* Menu:
164
165	* Creating Serialized Tables::
166	* Loading and Unloading Serialized Tables::
167	* Tables File Format::
168
169
170	File: flex.info, Node: Creating Serialized Tables, Next: Loading and Unloading Serialized Tables, Prev: Serialized Tables, Up: Serialized Tables
171
172	Creating Serialized Tables
173	==========================
174
175	You may create a scanner with serialized tables by specifying:
176
177
178	%option tables-file=FILE
179	or
180	--tables-file=FILE
181
182	These options instruct flex to save the DFA tables to the file FILE.
183	The tables will _not_ be embedded in the generated scanner. The scanner
184	will not function on its own. The scanner will be dependent upon the
185	serialized tables. You must load the tables from this file at runtime
186	before you can scan anything.
187
188	If you do not specify a filename to `--tables-file', the tables will
189	be saved to `lex.yy.tables', where `yy' is the appropriate prefix.
190
191	If your project uses several different scanners, you can concatenate
192	the serialized tables into one file, and flex will find the correct set
193	of tables, using the scanner prefix as part of the lookup key. An
194	example follows:
195
196
197	$ flex --tables-file --prefix=cpp cpp.l
198	$ flex --tables-file --prefix=c c.l
199	$ cat lex.cpp.tables lex.c.tables > all.tables
200
201	The above example created two scanners, `cpp', and `c'. Since we did
202	not specify a filename, the tables were serialized to `lex.c.tables' and
203	`lex.cpp.tables', respectively. Then, we concatenated the two files
204	together into `all.tables', which we will distribute with our project.
205	At runtime, we will open the file and tell flex to load the tables from
206	it. Flex will find the correct tables automatically. (See next
207	section).
208
209
210	File: flex.info, Node: Loading and Unloading Serialized Tables, Next: Tables File Format, Prev: Creating Serialized Tables, Up: Serialized Tables
211
212	Loading and Unloading Serialized Tables
213	=======================================
214
215	If you've built your scanner with `%option tables-file', then you
216	must load the scanner tables at runtime. This can be accomplished with
217	the following function:
218
219	- Function: int yytables_fload (FILE* FP [, yyscan_t SCANNER])
220	Locates scanner tables in the stream pointed to by FP and loads
221	them. Memory for the tables is allocated via `yyalloc'. You must
222	call this function before the first call to `yylex'. The argument
223	SCANNER only appears in the reentrant scanner. This function
224	returns `0' (zero) on success, or non-zero on error.
225
226	The loaded tables are not automatically destroyed (unloaded) when
227	you call `yylex_destroy'. The reason is that you may create several
228	scanners of the same type (in a reentrant scanner), each of which needs
229	access to these tables. To avoid a nasty memory leak, you must call
230	the following function:
231
232	- Function: int yytables_destroy ([yyscan_t SCANNER])
233	Unloads the scanner tables. The tables must be loaded again before
234	you can scan any more data. The argument SCANNER only appears in
235	the reentrant scanner. This function returns `0' (zero) on
236	success, or non-zero on error.
237
238	*The functions `yytables_fload' and `yytables_destroy' are not
239	thread-safe.* You must ensure that these functions are called exactly
240	once (for each scanner type) in a threaded program, before any thread
241	calls `yylex'. After the tables are loaded, they are never written to,
242	and no thread protection is required thereafter - until you destroy
243	them.
244
245
246	File: flex.info, Node: Tables File Format, Prev: Loading and Unloading Serialized Tables, Up: Serialized Tables
247
248	Tables File Format
249	==================
250
251	This section defines the file format of serialized `flex' tables.
252
253	The tables format allows for one or more sets of tables to be
254	specified, where each set corresponds to a given scanner. Scanners are
255	indexed by name, as described below. The file format is as follows:
256
257
258	TABLE SET 1
259	+-------------------------------+
260	Header \| uint32 th_magic; \|
261	\| uint32 th_hsize; \|
262	\| uint32 th_ssize; \|
263	\| uint16 th_flags; \|
264	\| char th_version[]; \|
265	\| char th_name[]; \|
266	\| uint8 th_pad64[]; \|
267	+-------------------------------+
268	Table 1 \| uint16 td_id; \|
269	\| uint16 td_flags; \|
270	\| uint32 td_lolen; \|
271	\| uint32 td_hilen; \|
272	\| void td_data[]; \|
273	\| uint8 td_pad64[]; \|
274	+-------------------------------+
275	Table 2 \| \|
276	. . .
277	. . .
278	. . .
279	. . .
280	Table n \| \|
281	+-------------------------------+
282	TABLE SET 2
283	.
284	.
285	.
286	TABLE SET N
287
288	The above diagram shows that a complete set of tables consists of a
289	header followed by multiple individual tables. Furthermore, multiple
290	complete sets may be present in the same file, each set with its own
291	header and tables. The sets are contiguous in the file. The only way to
292	know if another set follows is to check the next four bytes for the
293	magic number (or check for EOF). The header and tables sections are
294	padded to 64-bit boundaries. Below we describe each field in detail.
295	This format does not specify how the scanner will expand the given
296	data, i.e., data may be serialized as int8, but expanded to an int32
297	array at runtime. This is to reduce the size of the serialized data
298	where possible. Remember, _all integer values are in network byte
299	order_.
300
301	Fields of a table header:
302
303	`th_magic'
304	Magic number, always 0xF13C57B1.
305
306	`th_hsize'
307	Size of this entire header, in bytes, including all fields plus
308	any padding.
309
310	`th_ssize'
311	Size of this entire set, in bytes, including the header, all
312	tables, plus any padding.
313
314	`th_flags'
315	Bit flags for this table set. Currently unused.
316
317	`th_version[]'
318	Flex version in NULL-termninated string format. e.g., `2.5.13a'.
319	This is the version of flex that was used to create the serialized
320	tables.
321
322	`th_name[]'
323	Contains the name of this table set. The default is `yytables',
324	and is prefixed accordingly, e.g., `footables'. Must be
325	NULL-terminated.
326
327	`th_pad64[]'
328	Zero or more NULL bytes, padding the entire header to the next
329	64-bit boundary as calculated from the beginning of the header.
330
331	Fields of a table:
332
333	`td_id'
334	Specifies the table identifier. Possible values are:
335	`YYTD_ID_ACCEPT (0x01)'
336	`yy_accept'
337
338	`YYTD_ID_BASE (0x02)'
339	`yy_base'
340
341	`YYTD_ID_CHK (0x03)'
342	`yy_chk'
343
344	`YYTD_ID_DEF (0x04)'
345	`yy_def'
346
347	`YYTD_ID_EC (0x05)'
348	`yy_ec '
349
350	`YYTD_ID_META (0x06)'
351	`yy_meta'
352
353	`YYTD_ID_NUL_TRANS (0x07)'
354	`yy_NUL_trans'
355
356	`YYTD_ID_NXT (0x08)'
357	`yy_nxt'. This array may be two dimensional. See the
358	`td_hilen' field below.
359
360	`YYTD_ID_RULE_CAN_MATCH_EOL (0x09)'
361	`yy_rule_can_match_eol'
362
363	`YYTD_ID_START_STATE_LIST (0x0A)'
364	`yy_start_state_list'. This array is handled specially
365	because it is an array of pointers to structs. See the
366	`td_flags' field below.
367
368	`YYTD_ID_TRANSITION (0x0B)'
369	`yy_transition'. This array is handled specially because it
370	is an array of structs. See the `td_lolen' field below.
371
372	`YYTD_ID_ACCLIST (0x0C)'
373	`yy_acclist'
374
375	`td_flags'
376	Bit flags describing how to interpret the data in `td_data'. The
377	data arrays are one-dimensional by default, but may be two
378	dimensional as specified in the `td_hilen' field.
379
380	`YYTD_DATA8 (0x01)'
381	The data is serialized as an array of type int8.
382
383	`YYTD_DATA16 (0x02)'
384	The data is serialized as an array of type int16.
385
386	`YYTD_DATA32 (0x04)'
387	The data is serialized as an array of type int32.
388
389	`YYTD_PTRANS (0x08)'
390	The data is a list of indexes of entries in the expanded
391	`yy_transition' array. Each index should be expanded to a
392	pointer to the corresponding entry in the `yy_transition'
393	array. We count on the fact that the `yy_transition' array
394	has already been seen.
395
396	`YYTD_STRUCT (0x10)'
397	The data is a list of yy_trans_info structs, each of which
398	consists of two integers. There is no padding between struct
399	elements or between structs. The type of each member is
400	determined by the `YYTD_DATA*' bits.
401
402	`td_lolen'
403	Specifies the number of elements in the lowest dimension array. If
404	this is a one-dimensional array, then it is simply the number of
405	elements in this array. The element size is determined by the
406	`td_flags' field.
407
408	`td_hilen'
409	If `td_hilen' is non-zero, then the data is a two-dimensional
410	array. Otherwise, the data is a one-dimensional array. `td_hilen'
411	contains the number of elements in the higher dimensional array,
412	and `td_lolen' contains the number of elements in the lowest
413	dimension.
414
415	Conceptually, `td_data' is either `sometype td_data[td_lolen]', or
416	`sometype td_data[td_hilen][td_lolen]', where `sometype' is
417	specified by the `td_flags' field. It is possible for both
418	`td_lolen' and `td_hilen' to be zero, in which case `td_data' is a
419	zero length array, and no data is loaded, i.e., this table is
420	simply skipped. Flex does not currently generate tables of zero
421	length.
422
423	`td_data[]'
424	The table data. This array may be a one- or two-dimensional array,
425	of type `int8', `int16', `int32', `struct yy_trans_info', or
426	`struct yy_trans_info*', depending upon the values in the
427	`td_flags', `td_lolen', and `td_hilen' fields.
428
429	`td_pad64[]'
430	Zero or more NULL bytes, padding the entire table to the next
431	64-bit boundary as calculated from the beginning of this table.
432
433
434	File: flex.info, Node: Diagnostics, Next: Limitations, Prev: Serialized Tables, Up: Top
435
436	Diagnostics
437	***********
438
439	The following is a list of `flex' diagnostic messages:
440
441	* `warning, rule cannot be matched' indicates that the given rule
442	cannot be matched because it follows other rules that will always
443	match the same text as it. For example, in the following `foo'
444	cannot be matched because it comes after an identifier "catch-all"
445	rule:
446
447
448	[a-z]+ got_identifier();
449	foo got_foo();
450
451	Using `REJECT' in a scanner suppresses this warning.
452
453	* `warning, -s option given but default rule can be matched' means
454	that it is possible (perhaps only in a particular start condition)
455	that the default rule (match any single character) is the only one
456	that will match a particular input. Since `-s' was given,
457	presumably this is not intended.
458
459	* `reject_used_but_not_detected undefined' or
460	`yymore_used_but_not_detected undefined'. These errors can occur
461	at compile time. They indicate that the scanner uses `REJECT' or
462	`yymore()' but that `flex' failed to notice the fact, meaning that
463	`flex' scanned the first two sections looking for occurrences of
464	these actions and failed to find any, but somehow you snuck some in
465	(via a #include file, for example). Use `%option reject' or
466	`%option yymore' to indicate to `flex' that you really do use
467	these features.
468
469	* `flex scanner jammed'. a scanner compiled with `-s' has
470	encountered an input string which wasn't matched by any of its
471	rules. This error can also occur due to internal problems.
472
473	* `token too large, exceeds YYLMAX'. your scanner uses `%array' and
474	one of its rules matched a string longer than the `YYLMAX'
475	constant (8K bytes by default). You can increase the value by
476	#define'ing `YYLMAX' in the definitions section of your `flex'
477	input.
478
479	* `scanner requires -8 flag to use the character 'x''. Your scanner
480	specification includes recognizing the 8-bit character `'x'' and
481	you did not specify the -8 flag, and your scanner defaulted to
482	7-bit because you used the `-Cf' or `-CF' table compression
483	options. See the discussion of the `-7' flag, *Note Scanner
484	Options::, for details.
485
486	* `flex scanner push-back overflow'. you used `unput()' to push back
487	so much text that the scanner's buffer could not hold both the
488	pushed-back text and the current token in `yytext'. Ideally the
489	scanner should dynamically resize the buffer in this case, but at
490	present it does not.
491
492	* `input buffer overflow, can't enlarge buffer because scanner uses
493	REJECT'. the scanner was working on matching an extremely large
494	token and needed to expand the input buffer. This doesn't work
495	with scanners that use `REJECT'.
496
497	* `fatal flex scanner internal error--end of buffer missed'. This can
498	occur in a scanner which is reentered after a long-jump has jumped
499	out (or over) the scanner's activation frame. Before reentering
500	the scanner, use:
501
502	yyrestart( yyin );
503	or, as noted above, switch to using the C++ scanner class.
504
505	* `too many start conditions in <> construct!' you listed more start
506	conditions in a <> construct than exist (so you must have listed at
507	least one of them twice).
508
509
510	File: flex.info, Node: Limitations, Next: Bibliography, Prev: Diagnostics, Up: Top
511
512	Limitations
513	***********
514
515	Some trailing context patterns cannot be properly matched and
516	generate warning messages (`dangerous trailing context'). These are
517	patterns where the ending of the first part of the rule matches the
518	beginning of the second part, such as `zx/xy', where the 'x*' matches
519	the 'x' at the beginning of the trailing context. (Note that the POSIX
520	draft states that the text matched by such patterns is undefined.) For
521	some trailing context rules, parts which are actually fixed-length are
522	not recognized as such, leading to the abovementioned performance loss.
523	In particular, parts using `\|' or `{n}' (such as `foo{3}') are always
524	considered variable-length. Combining trailing context with the
525	special `\|' action can result in _fixed_ trailing context being turned
526	into the more expensive _variable_ trailing context. For example, in
527	the following:
528
529
530	%%
531	abc \|
532	xyz/def
533
534	Use of `unput()' invalidates yytext and yyleng, unless the `%array'
535	directive or the `-l' option has been used. Pattern-matching of `NUL's
536	is substantially slower than matching other characters. Dynamic
537	resizing of the input buffer is slow, as it entails rescanning all the
538	text matched so far by the current (generally huge) token. Due to both
539	buffering of input and read-ahead, you cannot intermix calls to
540	`<stdio.h>' routines, such as, getchar(), with `flex' rules and expect
541	it to work. Call `input()' instead. The total table entries listed by
542	the `-v' flag excludes the number of table entries needed to determine
543	what rule has been matched. The number of entries is equal to the
544	number of DFA states if the scanner does not use `REJECT', and somewhat
545	greater than the number of states if it does. `REJECT' cannot be used
546	with the `-f' or `-F' options.
547
548	The `flex' internal algorithms need documentation.
549
550
551	File: flex.info, Node: Bibliography, Next: FAQ, Prev: Limitations, Up: Top
552
553	Additional Reading
554	******************
555
556	You may wish to read more about the following programs:
557	* lex
558
559	* yacc
560
561	* sed
562
563	* awk
564
565	The following books may contain material of interest:
566
567	John Levine, Tony Mason, and Doug Brown, _Lex & Yacc_, O'Reilly and
568	Associates. Be sure to get the 2nd edition.
569
570	M. E. Lesk and E. Schmidt, _LEX - Lexical Analyzer Generator_
571
572	Alfred Aho, Ravi Sethi and Jeffrey Ullman, _Compilers: Principles,
573	Techniques and Tools_, Addison-Wesley (1986). Describes the
574	pattern-matching techniques used by `flex' (deterministic finite
575	automata).
576
577
578	File: flex.info, Node: FAQ, Next: Appendices, Prev: Bibliography, Up: Top
579
580	FAQ
581	***
582
583	From time to time, the `flex' maintainer receives certain questions.
584	Rather than repeat answers to well-understood problems, we publish them
585	here.
586
587	* Menu:
588
589	* When was flex born?::
590	* How do I expand \ escape sequences in C-style quoted strings?::
591	* Why do flex scanners call fileno if it is not ANSI compatible?::
592	* Does flex support recursive pattern definitions?::
593	* How do I skip huge chunks of input (tens of megabytes) while using flex?::
594	* Flex is not matching my patterns in the same order that I defined them.::
595	* My actions are executing out of order or sometimes not at all.::
596	* How can I have multiple input sources feed into the same scanner at the same time?::
597	* Can I build nested parsers that work with the same input file?::
598	* How can I match text only at the end of a file?::
599	* How can I make REJECT cascade across start condition boundaries?::
600	* Why cant I use fast or full tables with interactive mode?::
601	* How much faster is -F or -f than -C?::
602	* If I have a simple grammar cant I just parse it with flex?::
603	* Why doesnt yyrestart() set the start state back to INITIAL?::
604	* How can I match C-style comments?::
605	* The period isnt working the way I expected.::
606	* Can I get the flex manual in another format?::
607	* Does there exist a "faster" NDFA->DFA algorithm?::
608	* How does flex compile the DFA so quickly?::
609	* How can I use more than 8192 rules?::
610	* How do I abandon a file in the middle of a scan and switch to a new file?::
611	* How do I execute code only during initialization (only before the first scan)?::
612	* How do I execute code at termination?::
613	* Where else can I find help?::
614	* Can I include comments in the "rules" section of the file?::
615	* I get an error about undefined yywrap().::
616	* How can I change the matching pattern at run time?::
617	* How can I expand macros in the input?::
618	* How can I build a two-pass scanner?::
619	* How do I match any string not matched in the preceding rules?::
620	* I am trying to port code from AT&T lex that uses yysptr and yysbuf.::
621	* Is there a way to make flex treat NULL like a regular character?::
622	* Whenever flex can not match the input it says "flex scanner jammed".::
623	* Why doesnt flex have non-greedy operators like perl does?::
624	* Memory leak - 16386 bytes allocated by malloc.::
625	* How do I track the byte offset for lseek()?::
626	* How do I use my own I/O classes in a C++ scanner?::
627	* How do I skip as many chars as possible?::
628	* deleteme00::
629	* Are certain equivalent patterns faster than others?::
630	* Is backing up a big deal?::
631	* Can I fake multi-byte character support?::
632	* deleteme01::
633	* Can you discuss some flex internals?::
634	* unput() messes up yy_at_bol::
635	* The \| operator is not doing what I want::
636	* Why can't flex understand this variable trailing context pattern?::
637	* The ^ operator isn't working::
638	* Trailing context is getting confused with trailing optional patterns::
639	* Is flex GNU or not?::
640	* ERASEME53::
641	* I need to scan if-then-else blocks and while loops::
642	* ERASEME55::
643	* ERASEME56::
644	* ERASEME57::
645	* Is there a repository for flex scanners?::
646	* How can I conditionally compile or preprocess my flex input file?::
647	* Where can I find grammars for lex and yacc?::
648	* I get an end-of-buffer message for each character scanned.::
649	* unnamed-faq-62::
650	* unnamed-faq-63::
651	* unnamed-faq-64::
652	* unnamed-faq-65::
653	* unnamed-faq-66::
654	* unnamed-faq-67::
655	* unnamed-faq-68::
656	* unnamed-faq-69::
657	* unnamed-faq-70::
658	* unnamed-faq-71::
659	* unnamed-faq-72::
660	* unnamed-faq-73::
661	* unnamed-faq-74::
662	* unnamed-faq-75::
663	* unnamed-faq-76::
664	* unnamed-faq-77::
665	* unnamed-faq-78::
666	* unnamed-faq-79::
667	* unnamed-faq-80::
668	* unnamed-faq-81::
669	* unnamed-faq-82::
670	* unnamed-faq-83::
671	* unnamed-faq-84::
672	* unnamed-faq-85::
673	* unnamed-faq-86::
674	* unnamed-faq-87::
675	* unnamed-faq-88::
676	* unnamed-faq-90::
677	* unnamed-faq-91::
678	* unnamed-faq-92::
679	* unnamed-faq-93::
680	* unnamed-faq-94::
681	* unnamed-faq-95::
682	* unnamed-faq-96::
683	* unnamed-faq-97::
684	* unnamed-faq-98::
685	* unnamed-faq-99::
686	* unnamed-faq-100::
687	* unnamed-faq-101::
688	* What is the difference between YYLEX_PARAM and YY_DECL?::
689	* Why do I get "conflicting types for yylex" error?::
690	* How do I access the values set in a Flex action from within a Bison action?::
691
692
693	File: flex.info, Node: When was flex born?, Next: How do I expand \ escape sequences in C-style quoted strings?, Up: FAQ
694
695	When was flex born?
696	===================
697
698	Vern Paxson took over the `Software Tools' lex project from Jef
699	Poskanzer in 1982. At that point it was written in Ratfor. Around
700	1987 or so, Paxson translated it into C, and a legend was born :-).
701
702
703	File: flex.info, Node: How do I expand \ escape sequences in C-style quoted strings?, Next: Why do flex scanners call fileno if it is not ANSI compatible?, Prev: When was flex born?, Up: FAQ
704
705	How do I expand \ escape sequences in C-style quoted strings?
706	=============================================================
707
708	A key point when scanning quoted strings is that you cannot (easily)
709	write a single rule that will precisely match the string if you allow
710	things like embedded escape sequences and newlines. If you try to
711	match strings with a single rule then you'll wind up having to rescan
712	the string anyway to find any escape sequences.
713
714	Instead you can use exclusive start conditions and a set of rules,
715	one for matching non-escaped text, one for matching a single escape,
716	one for matching an embedded newline, and one for recognizing the end
717	of the string. Each of these rules is then faced with the question of
718	where to put its intermediary results. The best solution is for the
719	rules to append their local value of `yytext' to the end of a "string
720	literal" buffer. A rule like the escape-matcher will append to the
721	buffer the meaning of the escape sequence rather than the literal text
722	in `yytext'. In this way, `yytext' does not need to be modified at all.
723
724
725	File: flex.info, Node: Why do flex scanners call fileno if it is not ANSI compatible?, Next: Does flex support recursive pattern definitions?, Prev: How do I expand \ escape sequences in C-style quoted strings?, Up: FAQ
726
727	Why do flex scanners call fileno if it is not ANSI compatible?
728	==============================================================
729
730	Flex scanners call `fileno()' in order to get the file descriptor
731	corresponding to `yyin'. The file descriptor may be passed to
732	`isatty()' or `read()', depending upon which `%options' you specified.
733	If your system does not have `fileno()' support, to get rid of the
734	`read()' call, do not specify `%option read'. To get rid of the
735	`isatty()' call, you must specify one of `%option always-interactive' or
736	`%option never-interactive'.
737
738
739	File: flex.info, Node: Does flex support recursive pattern definitions?, Next: How do I skip huge chunks of input (tens of megabytes) while using flex?, Prev: Why do flex scanners call fileno if it is not ANSI compatible?, Up: FAQ
740
741	Does flex support recursive pattern definitions?
742	================================================
743
744	e.g.,
745
746
747	%%
748	block "{"({block}\|{statement})*"}"
749
750	No. You cannot have recursive definitions. The pattern-matching
751	power of regular expressions in general (and therefore flex scanners,
752	too) is limited. In particular, regular expressions cannot "balance"
753	parentheses to an arbitrary degree. For example, it's impossible to
754	write a regular expression that matches all strings containing the same
755	number of '{'s as '}'s. For more powerful pattern matching, you need a
756	parser, such as `GNU bison'.
757
758
759	File: flex.info, Node: How do I skip huge chunks of input (tens of megabytes) while using flex?, Next: Flex is not matching my patterns in the same order that I defined them., Prev: Does flex support recursive pattern definitions?, Up: FAQ
760
761	How do I skip huge chunks of input (tens of megabytes) while using flex?
762	========================================================================
763
764	Use `fseek()' (or `lseek()') to position yyin, then call
765	`yyrestart()'.
766
767
768	File: flex.info, Node: Flex is not matching my patterns in the same order that I defined them., Next: My actions are executing out of order or sometimes not at all., Prev: How do I skip huge chunks of input (tens of megabytes) while using flex?, Up: FAQ
769
770	Flex is not matching my patterns in the same order that I defined them.
771	=======================================================================
772
773	`flex' picks the rule that matches the most text (i.e., the longest
774	possible input string). This is because `flex' uses an entirely
775	different matching technique ("deterministic finite automata") that
776	actually does all of the matching simultaneously, in parallel. (Seems
777	impossible, but it's actually a fairly simple technique once you
778	understand the principles.)
779
780	A side-effect of this parallel matching is that when the input
781	matches more than one rule, `flex' scanners pick the rule that matched
782	the _most_ text. This is explained further in the manual, in the
783	section *Note Matching::.
784
785	If you want `flex' to choose a shorter match, then you can work
786	around this behavior by expanding your short rule to match more text,
787	then put back the extra:
788
789
790	data_.* yyless( 5 ); BEGIN BLOCKIDSTATE;
791
792	Another fix would be to make the second rule active only during the
793	`<BLOCKIDSTATE>' start condition, and make that start condition
794	exclusive by declaring it with `%x' instead of `%s'.
795
796	A final fix is to change the input language so that the ambiguity for
797	`data_' is removed, by adding characters to it that don't match the
798	identifier rule, or by removing characters (such as `_') from the
799	identifier rule so it no longer matches `data_'. (Of course, you might
800	also not have the option of changing the input language.)
801
802
803	File: flex.info, Node: My actions are executing out of order or sometimes not at all., Next: How can I have multiple input sources feed into the same scanner at the same time?, Prev: Flex is not matching my patterns in the same order that I defined them., Up: FAQ
804
805	My actions are executing out of order or sometimes not at all.
806	==============================================================
807
808	Most likely, you have (in error) placed the opening `{' of the action
809	block on a different line than the rule, e.g.,
810
811
812	^(foo\|bar)
813	{ <<<--- WRONG!
814
815	}
816
817	`flex' requires that the opening `{' of an action associated with a
818	rule begin on the same line as does the rule. You need instead to
819	write your rules as follows:
820
821
822	^(foo\|bar) { // CORRECT!
823
824	}
825
826
827	File: flex.info, Node: How can I have multiple input sources feed into the same scanner at the same time?, Next: Can I build nested parsers that work with the same input file?, Prev: My actions are executing out of order or sometimes not at all., Up: FAQ
828
829	How can I have multiple input sources feed into the same scanner at the same time?
830	==================================================================================
831
832	If ...
833	* your scanner is free of backtracking (verified using `flex''s `-b'
834	flag),
835
836	* AND you run your scanner interactively (`-I' option; default
837	unless using special table compression options),
838
839	* AND you feed it one character at a time by redefining `YY_INPUT'
840	to do so,
841
842	then every time it matches a token, it will have exhausted its input
843	buffer (because the scanner is free of backtracking). This means you
844	can safely use `select()' at the point and only call `yylex()' for
845	another token if `select()' indicates there's data available.
846
847	That is, move the `select()' out from the input function to a point
848	where it determines whether `yylex()' gets called for the next token.
849
850	With this approach, you will still have problems if your input can
851	arrive piecemeal; `select()' could inform you that the beginning of a
852	token is available, you call `yylex()' to get it, but it winds up
853	blocking waiting for the later characters in the token.
854
855	Here's another way: Move your input multiplexing inside of
856	`YY_INPUT'. That is, whenever `YY_INPUT' is called, it `select()''s to
857	see where input is available. If input is available for the scanner,
858	it reads and returns the next byte. If input is available from another
859	source, it calls whatever function is responsible for reading from that
860	source. (If no input is available, it blocks until some input is
861	available.) I've used this technique in an interpreter I wrote that
862	both reads keyboard input using a `flex' scanner and IPC traffic from
863	sockets, and it works fine.
864
865
866	File: flex.info, Node: Can I build nested parsers that work with the same input file?, Next: How can I match text only at the end of a file?, Prev: How can I have multiple input sources feed into the same scanner at the same time?, Up: FAQ
867
868	Can I build nested parsers that work with the same input file?
869	==============================================================
870
871	This is not going to work without some additional effort. The
872	reason is that `flex' block-buffers the input it reads from `yyin'.
873	This means that the "outermost" `yylex()', when called, will
874	automatically slurp up the first 8K of input available on yyin, and
875	subsequent calls to other `yylex()''s won't see that input. You might
876	be tempted to work around this problem by redefining `YY_INPUT' to only
877	return a small amount of text, but it turns out that that approach is
878	quite difficult. Instead, the best solution is to combine all of your
879	scanners into one large scanner, using a different exclusive start
880	condition for each.
881
882
883	File: flex.info, Node: How can I match text only at the end of a file?, Next: How can I make REJECT cascade across start condition boundaries?, Prev: Can I build nested parsers that work with the same input file?, Up: FAQ
884
885	How can I match text only at the end of a file?
886	===============================================
887
888	There is no way to write a rule which is "match this text, but only
889	if it comes at the end of the file". You can fake it, though, if you
890	happen to have a character lying around that you don't allow in your
891	input. Then you redefine `YY_INPUT' to call your own routine which, if
892	it sees an `EOF', returns the magic character first (and remembers to
893	return a real `EOF' next time it's called). Then you could write:
894
895
896	<COMMENT>(.\|\n){EOF_CHAR} / saw comment at EOF */
897
898
899	File: flex.info, Node: How can I make REJECT cascade across start condition boundaries?, Next: Why cant I use fast or full tables with interactive mode?, Prev: How can I match text only at the end of a file?, Up: FAQ
900
901	How can I make REJECT cascade across start condition boundaries?
902	================================================================
903
904	You can do this as follows. Suppose you have a start condition `A',
905	and after exhausting all of the possible matches in `<A>', you want to
906	try matches in `<INITIAL>'. Then you could use the following:
907
908
909	%x A
910	%%
911	<A>rule_that_is_long ...; REJECT;
912	<A>rule ...; REJECT; /* shorter rule */
913	<A>etc.
914	...
915	<A>.\|\n {
916	/* Shortest and last rule in <A>, so
917	* cascaded REJECT's will eventually
918	* wind up matching this rule. We want
919	* to now switch to the initial state
920	* and try matching from there instead.
921	*/
922	yyless(0); /* put back matched text */
923	BEGIN(INITIAL);
924	}
925
926
927	File: flex.info, Node: Why cant I use fast or full tables with interactive mode?, Next: How much faster is -F or -f than -C?, Prev: How can I make REJECT cascade across start condition boundaries?, Up: FAQ
928
929	Why can't I use fast or full tables with interactive mode?
930	==========================================================
931
932	One of the assumptions flex makes is that interactive applications
933	are inherently slow (they're waiting on a human after all). It has to
934	do with how the scanner detects that it must be finished scanning a
935	token. For interactive scanners, after scanning each character the
936	current state is looked up in a table (essentially) to see whether
937	there's a chance of another input character possibly extending the
938	length of the match. If not, the scanner halts. For non-interactive
939	scanners, the end-of-token test is much simpler, basically a compare
940	with 0, so no memory bus cycles. Since the test occurs in the
941	innermost scanning loop, one would like to make it go as fast as
942	possible.
943
944	Still, it seems reasonable to allow the user to choose to trade off
945	a bit of performance in this area to gain the corresponding
946	flexibility. There might be another reason, though, why fast scanners
947	don't support the interactive option.
948
949
950	File: flex.info, Node: How much faster is -F or -f than -C?, Next: If I have a simple grammar cant I just parse it with flex?, Prev: Why cant I use fast or full tables with interactive mode?, Up: FAQ
951
952	How much faster is -F or -f than -C?
953	====================================
954
955	Much faster (factor of 2-3).
956
957
958	File: flex.info, Node: If I have a simple grammar cant I just parse it with flex?, Next: Why doesnt yyrestart() set the start state back to INITIAL?, Prev: How much faster is -F or -f than -C?, Up: FAQ
959
960	If I have a simple grammar can't I just parse it with flex?
961	===========================================================
962
963	Is your grammar recursive? That's almost always a sign that you're
964	better off using a parser/scanner rather than just trying to use a
965	scanner alone.
966
967
968	File: flex.info, Node: Why doesnt yyrestart() set the start state back to INITIAL?, Next: How can I match C-style comments?, Prev: If I have a simple grammar cant I just parse it with flex?, Up: FAQ
969
970	Why doesn't yyrestart() set the start state back to INITIAL?
971	============================================================
972
973	There are two reasons. The first is that there might be programs
974	that rely on the start state not changing across file changes. The
975	second is that beginning with `flex' version 2.4, use of `yyrestart()'
976	is no longer required, so fixing the problem there doesn't solve the
977	more general problem.
978
979
980	File: flex.info, Node: How can I match C-style comments?, Next: The period isnt working the way I expected., Prev: Why doesnt yyrestart() set the start state back to INITIAL?, Up: FAQ
981
982	How can I match C-style comments?
983	=================================
984
985	You might be tempted to try something like this:
986
987
988	"/"."*/" // WRONG!
989
990	or, worse, this:
991
992
993	"/"(.\|\n)"/" // WRONG!
994
995	The above rules will eat too much input, and blow up on things like:
996
997
998	/* a comment / do_my_thing( "oops /" );
999
1000	Here is one way which allows you to track line information:
1001
1002
1003	<INITIAL>{
1004	"/*" BEGIN(IN_COMMENT);
1005	}
1006	<IN_COMMENT>{
1007	"*/" BEGIN(INITIAL);
1008	[^*\n]+ // eat comment in chunks
1009	"*" // eat the lone star
1010	\n yylineno++;
1011	}
1012
1013
1014	File: flex.info, Node: The period isnt working the way I expected., Next: Can I get the flex manual in another format?, Prev: How can I match C-style comments?, Up: FAQ
1015
1016	The '.' isn't working the way I expected.
1017	=========================================
1018
1019	Here are some tips for using `.':
1020
1021	* A common mistake is to place the grouping parenthesis AFTER an
1022	operator, when you really meant to place the parenthesis BEFORE
1023	the operator, e.g., you probably want this `(foo\|bar)+' and NOT
1024	this `(foo\|bar+)'.
1025
1026	The first pattern matches the words `foo' or `bar' any number of
1027	times, e.g., it matches the text `barfoofoobarfoo'. The second
1028	pattern matches a single instance of `foo' or a single instance of
1029	`bar' followed by one or more `r's, e.g., it matches the text
1030	`barrrr' .
1031
1032	* A `.' inside `[]''s just means a literal`.' (period), and NOT "any
1033	character except newline".
1034
1035	* Remember that `.' matches any character EXCEPT `\n' (and `EOF').
1036	If you really want to match ANY character, including newlines,
1037	then use `(.\|\n)' Beware that the regex `(.\|\n)+' will match your
1038	entire input!
1039
1040	* Finally, if you want to match a literal `.' (a period), then use
1041	`[.]' or `"."'
1042
1043
1044	File: flex.info, Node: Can I get the flex manual in another format?, Next: Does there exist a "faster" NDFA->DFA algorithm?, Prev: The period isnt working the way I expected., Up: FAQ
1045
1046	Can I get the flex manual in another format?
1047	============================================
1048
1049	The `flex' source distribution includes a texinfo manual. You are
1050	free to convert that texinfo into whatever format you desire. The
1051	`texinfo' package includes tools for conversion to a number of formats.
1052
1053
1054	File: flex.info, Node: Does there exist a "faster" NDFA->DFA algorithm?, Next: How does flex compile the DFA so quickly?, Prev: Can I get the flex manual in another format?, Up: FAQ
1055
1056	Does there exist a "faster" NDFA->DFA algorithm?
1057	================================================
1058
1059	There's no way around the potential exponential running time - it
1060	can take you exponential time just to enumerate all of the DFA states.
1061	In practice, though, the running time is closer to linear, or sometimes
1062	quadratic.
1063
1064
1065	File: flex.info, Node: How does flex compile the DFA so quickly?, Next: How can I use more than 8192 rules?, Prev: Does there exist a "faster" NDFA->DFA algorithm?, Up: FAQ
1066
1067	How does flex compile the DFA so quickly?
1068	=========================================
1069
1070	There are two big speed wins that `flex' uses:
1071
1072	1. It analyzes the input rules to construct equivalence classes for
1073	those characters that always make the same transitions. It then
1074	rewrites the NFA using equivalence classes for transitions instead
1075	of characters. This cuts down the NFA->DFA computation time
1076	dramatically, to the point where, for uncompressed DFA tables, the
1077	DFA generation is often I/O bound in writing out the tables.
1078
1079	2. It maintains hash values for previously computed DFA states, so
1080	testing whether a newly constructed DFA state is equivalent to a
1081	previously constructed state can be done very quickly, by first
1082	comparing hash values.
1083
1084
1085	File: flex.info, Node: How can I use more than 8192 rules?, Next: How do I abandon a file in the middle of a scan and switch to a new file?, Prev: How does flex compile the DFA so quickly?, Up: FAQ
1086
1087	How can I use more than 8192 rules?
1088	===================================
1089
1090	`Flex' is compiled with an upper limit of 8192 rules per scanner.
1091	If you need more than 8192 rules in your scanner, you'll have to
1092	recompile `flex' with the following changes in `flexdef.h':
1093
1094
1095	< #define YY_TRAILING_MASK 0x2000
1096	< #define YY_TRAILING_HEAD_MASK 0x4000
1097	--
1098	> #define YY_TRAILING_MASK 0x20000000
1099	> #define YY_TRAILING_HEAD_MASK 0x40000000
1100
1101	This should work okay as long as your C compiler uses 32 bit
1102	integers. But you might want to think about whether using such a huge
1103	number of rules is the best way to solve your problem.
1104
1105	The following may also be relevant:
1106
1107	With luck, you should be able to increase the definitions in
1108	flexdef.h for:
1109
1110
1111	#define JAMSTATE -32766 /* marks a reference to the state that always jams */
1112	#define MAXIMUM_MNS 31999
1113	#define BAD_SUBSCRIPT -32767
1114
1115	recompile everything, and it'll all work. Flex only has these
1116	16-bit-like values built into it because a long time ago it was
1117	developed on a machine with 16-bit ints. I've given this advice to
1118	others in the past but haven't heard back from them whether it worked
1119	okay or not...
1120
1121
1122	File: flex.info, Node: How do I abandon a file in the middle of a scan and switch to a new file?, Next: How do I execute code only during initialization (only before the first scan)?, Prev: How can I use more than 8192 rules?, Up: FAQ
1123
1124	How do I abandon a file in the middle of a scan and switch to a new file?
1125	=========================================================================
1126
1127	Just call `yyrestart(newfile)'. Be sure to reset the start state if
1128	you want a "fresh start, since `yyrestart' does NOT reset the start
1129	state back to `INITIAL'.
1130
1131
1132	File: flex.info, Node: How do I execute code only during initialization (only before the first scan)?, Next: How do I execute code at termination?, Prev: How do I abandon a file in the middle of a scan and switch to a new file?, Up: FAQ
1133
1134	How do I execute code only during initialization (only before the first scan)?
1135	==============================================================================
1136
1137	You can specify an initial action by defining the macro
1138	`YY_USER_INIT' (though note that `yyout' may not be available at the
1139	time this macro is executed). Or you can add to the beginning of your
1140	rules section:
1141
1142
1143	%%
1144	/* Must be indented! */
1145	static int did_init = 0;
1146
1147	if ( ! did_init ){
1148	do_my_init();
1149	did_init = 1;
1150	}
1151
1152
1153	File: flex.info, Node: How do I execute code at termination?, Next: Where else can I find help?, Prev: How do I execute code only during initialization (only before the first scan)?, Up: FAQ
1154
1155	How do I execute code at termination?
1156	=====================================
1157
1158	You can specify an action for the `<<EOF>>' rule.
1159
1160
1161	File: flex.info, Node: Where else can I find help?, Next: Can I include comments in the "rules" section of the file?, Prev: How do I execute code at termination?, Up: FAQ
1162
1163	Where else can I find help?
1164	===========================
1165
1166	You can find the flex homepage on the web at
1167	`http://flex.sourceforge.net/'. See that page for details about flex
1168	mailing lists as well.
1169
1170
1171	File: flex.info, Node: Can I include comments in the "rules" section of the file?, Next: I get an error about undefined yywrap()., Prev: Where else can I find help?, Up: FAQ
1172
1173	Can I include comments in the "rules" section of the file?
1174	==========================================================
1175
1176	Yes, just about anywhere you want to. See the manual for the
1177	specific syntax.
1178
1179
1180	File: flex.info, Node: I get an error about undefined yywrap()., Next: How can I change the matching pattern at run time?, Prev: Can I include comments in the "rules" section of the file?, Up: FAQ
1181
1182	I get an error about undefined yywrap().
1183	========================================
1184
1185	You must supply a `yywrap()' function of your own, or link to
1186	`libfl.a' (which provides one), or use
1187
1188
1189	%option noyywrap
1190
1191	in your source to say you don't want a `yywrap()' function.
1192
1193
1194	File: flex.info, Node: How can I change the matching pattern at run time?, Next: How can I expand macros in the input?, Prev: I get an error about undefined yywrap()., Up: FAQ
1195
1196	How can I change the matching pattern at run time?
1197	==================================================
1198
1199	You can't, it's compiled into a static table when flex builds the
1200	scanner.
1201
1202
1203	File: flex.info, Node: How can I expand macros in the input?, Next: How can I build a two-pass scanner?, Prev: How can I change the matching pattern at run time?, Up: FAQ
1204
1205	How can I expand macros in the input?
1206	=====================================
1207
1208	The best way to approach this problem is at a higher level, e.g., in
1209	the parser.
1210
1211	However, you can do this using multiple input buffers.
1212
1213
1214	%%
1215	macro/[a-z]+ {
1216	/* Saw the macro "macro" followed by extra stuff. */
1217	main_buffer = YY_CURRENT_BUFFER;
1218	expansion_buffer = yy_scan_string(expand(yytext));
1219	yy_switch_to_buffer(expansion_buffer);
1220	}
1221
1222	<<EOF>> {
1223	if ( expansion_buffer )
1224	{
1225	// We were doing an expansion, return to where
1226	// we were.
1227	yy_switch_to_buffer(main_buffer);
1228	yy_delete_buffer(expansion_buffer);
1229	expansion_buffer = 0;
1230	}
1231	else
1232	yyterminate();
1233	}
1234
1235	You probably will want a stack of expansion buffers to allow nested
1236	macros. From the above though hopefully the idea is clear.
1237
1238
1239	File: flex.info, Node: How can I build a two-pass scanner?, Next: How do I match any string not matched in the preceding rules?, Prev: How can I expand macros in the input?, Up: FAQ
1240
1241	How can I build a two-pass scanner?
1242	===================================
1243
1244	One way to do it is to filter the first pass to a temporary file,
1245	then process the temporary file on the second pass. You will probably
1246	see a performance hit, do to all the disk I/O.
1247
1248	When you need to look ahead far forward like this, it almost always
1249	means that the right solution is to build a parse tree of the entire
1250	input, then walk it after the parse in order to generate the output.
1251	In a sense, this is a two-pass approach, once through the text and once
1252	through the parse tree, but the performance hit for the latter is
1253	usually an order of magnitude smaller, since everything is already
1254	classified, in binary format, and residing in memory.
1255

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format