COHERENT manpages

This page displays the COHERENT manpage for lex [Lexical analyzer generator].

List of available manpages
Index


lex -- Command

Lexical analyzer generator
lex [-t][-v][file]
cc lex.yy.c -ll

Many programs,  e.g., compilers, process highly  structured input according
to rules.  Two  of the most complicated parts of  such programs are lexical
analysis  and parsing  (also called syntax  analysis). The  COHERENT system
includes two powerful tools called lex and yacc to help you construct these
parts of  a program.  lex  converts a set  of lexical rules  into a lexical
analyzer, and yacc converts a set of parsing rules into a parser.

The  output of  lex  may be  used  directly, or  may  be used  by a  parser
generated by yacc.

lex reads a  specification from the given file (or  from the standard input
if  none),  and generates  a  C  function called  yylex().  lex writes  the
generated function in  the file lex.yy.c, or on standard  output if you use
the -t  option.  The -v  option prints some statistics  about the generated
tables.

The tutorial on lex that appear in this manual describes lex in detail.  In
brief, the generated function yylex()  matches portions of its input to one
pattern (sometimes  called a  regular expression) from  a set of  rules, or
context,  and executes  associated C commands.   Unmatched portions  of the
input are copied to the output  stream.  yylex() returns EOF when input has
been exhausted.

lex uses  the following macros  that you may replace  with the preprocessor
directive #undef if you wish: input() (read the standard input stream), and
output(c) (write  the character c to the standard  output stream).  You may
also replace  the following functions if you  wish: main() (main function),
error(...)  (print error  messages; takes  same  arguments as  printf), and
yywrap() (handle events at the end  of a file).  If an action is desired on
end of file, such as arranging  for more input, yywrap() should perform it,
returning zero to keep going.

A full lex specification has the following format:

-> Macro definitions, of the form:

       name    pattern

-> Start condition declarations:

       %S  NAME ...

-> Context declarations:

       %C  NAME ...

-> Code to be included in the header section:

       %{
       anything
       %}
       <tab or space> anything

-> Rules section delimiter (must always be present):

       %%

-> Code to appear at the start of yylex():

       <tab or space> anything

-> Rules for initial context, in any of the forms:

       rule        action;
       rule        | (means use next action)
       rule        {
       <tab or space>    action;
       <tab or space>    }

-> For each additional context:

       %C  NAME
       ...rules for this context...

-> End of rules section delimiter:

       %%

-> Code to  be copied  verbatim, such  as user provided  input(), output(),
   yywrap(), or other.

lex matches the longest string possible; if two rules match the same length
string, the  rule specified first  takes precedence.  lex  puts the matched
string, or token, in the char  array yytext[], and sets the variable yyleng
to its length.

Actions may use the following:

ECHO...........Output the token
REJECT.........Perform action for lower precedence match
BEGIN NAME.....Set start condition to NAME
BEGIN 0........Clear start condition
yyswitch(NAME).Switch to context NAME, return current
yyswitch(0)....Switch to initial context
yynext().......Steal next character from input
yyback(c)......Put character c back into input
yyless(n)......Reduce token length to n, put rest back
yymore().......Append next token to this one
yylook().......Returns number of chars in input buffer

lex rules are contiguous strings of the form

    [ <NAME,...> ][ ^ ] token [ /lookahead ][ $ ]

where brackets `[]' indicate optional items.

<NAME,...>Match only under given start conditions
^..............Match the beginning of a line
$..............Match the end of a line
token..........Pattern that a given token is to match
/lookahead.....Pattern that given trailing text is to match

Pattern elements:

a..............The character a
\a.............The character a, even if special
...............Any character except newline
[abx-z]........Any of a, b, or x through z
[^abx-z].......Any except a, b, or x through z
abc............The string abc, even if any are special
{name}.........The macro definition name
(exp)..........The pattern exp (grouping operator)

Optional operators on elements:

e?.............Zero or one occurrence of e
e*.............Zero or more consecutive es
e+.............One or more consecutive es
e{n}...........n (a decimal number) consecutive es
e{m,n}.........m through n consecutive es

Patterns may be of the form:

e1e2...........Matches the sequence e1 e2
e1|e2..........Matches either e1 or e2

lex recognizes the standard C escapes:  \n, \t, \r, \b, \f, and \ooo (octal
representation).  The special characters

     \ ( ) < > { } % * + ? [ - ] ^ / $ . |

must be prefixed with \ or enclosed within quotation marks (excepting " and
\) to  be normal.  Within  classes, only the characters  . ^ - \  and ] are
special.

Files

/usr/lib/libl.a
/usr/src/libl/* -- library source code

See Also

commands,
yacc
Introduction to lex, the Lexical Analyzer