COHERENT manpages

This page displays the COHERENT manpage for gawk [Pattern-scanning and -processing language].

List of available manpages
Index


gawk -- Command

Pattern-scanning and -processing language
gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
gawk [ POSIX or GNU style options ] [ -- ] program-text file ...

gawk is  the GNU Project's implementation of  the AWK programming language.
It conforms to the definition of  the language in the POSIX Standard 1003.2
Command Language and Utilities Standard.   This version in turn is based on
the description  in The  AWK Programming  Language, by Aho,  Kernighan, and
Weinberger, with the additional features  defined in the System V Release 4
version of  awk. gawk also provides some GNU-specific extensions.

The command line  consists of options to gawk itself,  the AWK program text
(if  not supplied  via the  options -f  or --file), and  values to  be made
available in the predefined AWK variables ARGC and ARGV.

Command-line Options

gawk options may be either the traditional POSIX one-letter options, or the
GNU style  long options.  POSIX Standard-style options  begin with a single
`-', whereas  GNU long options  begin with ``--''.   GNU-style long options
are  provided  for  both  GNU-specific  features  and  for  POSIX  mandated
features.  Other  implementations of  the AWK  language are likely  to only
accept the traditional one-letter options.

Following  the  POSIX  Standard,  gawk-specific  options are  supplied  via
arguments  to the  -W  option.  Multiple  -W  options may  be supplied,  or
multiple  arguments  may be  supplied  together if  they  are separated  by
commas, or enclosed in quotation  marks and separated by white space.  Case
is  ignored  in  arguments  to  the  -W  option.   Each  -W  option  has  a
corresponding GNU style long option, as detailed below.

gawk recognizes the following command-line options:

-F fs
--field-separator=fs
     Use  fs for  the input  field separator (the  value of  the predefined
     variable FS).

-v variable=value
--assign=variable=value
     Assign  value to  variable  before executing  the  program.  value  is
     available to the BEGIN block of an AWK program.

-f program-file
--file=program-file
     Read the AWK program's  source from file program-file, instead of from
     the  first command-line  argument.  The awk  command line  can contain
     more than one -f or --file options.

-W compat
--compat
     Run  in  compatibility  mode.   In  compatibility mode,  gawk  behaves
     identically  to  UNIX  awk; it  recognizes  none  of the  GNU-specific
     extensions are recognized.  These extensions are described below.

-W copyleft
-W copyright
--copyleft
--copyright
     Print the  short version of  the GNU copyright  information message on
     the standard error.

-W help
-W usage
--help
--usage
     Print  a relatively  short  summary of  the available  options on  the
     standard error.

-W lint
--lint
     Provide warnings about  constructs that are dubious or non-portable to
     other implementations of AWK.

-W posix
--posix
     This  turns  on  compatibility  mode,  with the  following  additional
     restrictions:

     -> The `\x' escape sequences are not recognized.

     -> The synonym func for the keyword function is not recognized.

     -> The operators ``**'' and ``**='' cannot be used in place of `^' and
        ``^=''.

-W source=program-text

--source=program-text
     Use program-text as the AWK program's source code.  This option allows
     the easy intermixing of library functions (used via the options -f and
     --file) with source code entered  on the command line.  It is intended
     primarily for medium to large AWK programs used in shell scripts.  The
     -W  source= form  of this  option uses  the rest  of the  command line
     argument for  program-text; no other options to  -W will be recognized
     in the same argument.

-W version
--version
     Print  version information  for this  particular copy  of gawk  on the
     standard error.   This is  useful mainly for  knowing if your  copy of
     gawk  is  up  to  date  with  what the  Free  Software  Foundation  is
     distributing.

--   Signal the end of options.   This is useful to allow further arguments
     to the  AWK program itself  to start with  a `-'.  This  is mainly for
     consistency with  the argument parsing  convention used by  most other
     POSIX Standard programs.

All other options are flagged as illegal and ignored.

AWK Program Execution

An AWK  program consists of  a sequence of  pattern/action statements, plus
optional function definitions:

    pattern { action statements }
    function name(parameter list) { statements }

gawk first  reads the program  source from the  program file (or  files) if
specified, or from the first  non-option argument on the command line.  The
option -f may  be used multiple times on the  command line.  gawk reads the
program text  as if all  the program-files had been  concatenated.  This is
useful for  building libraries of AWK functions,  without having to include
them in each new AWK program  that uses them.  To use a library function in
a file from a program typed in on the command line, specify /dev/tty as one
of the program files, type your program, and end it with a <ctrl-D>.

The  environment  variable AWKPATH  specifies  a search  path  to use  when
finding source  files named with the  option -f. If this  variable does not
exist, the default path is:

    .:/usr/lib/awk:/usr/local/lib/awk

If a file  name given to the -f option  contains a `/' character, gawk does
not perform a path search.

gawk executes AWK programs in the following order:

1. gawk compiles the program into an internal form.

2. All variable assignments specified via the -v option are performed.

3. gawk executes the  code in the BEGIN block (or  blocks), should there be
   any.

4. gawk then  proceeds to read  each file named  in the ARGV  array.  If no
   files are named on the command line, gawk reads the standard input.

If a file name on the command line has the form variable=value, gawk treats
it as  a variable assignment, and assigns value  to variable. (This happens
after every BEGIN block has been run.) Command-line assignment of variables
is most useful when you wish  to assign values dynamically to the variables
AWK uses  to control how  input is broken  into fields and  records.  It is
also  useful for  controlling the  state of  program execution  if multiple
passes are needed over a single data file.

If the value of a particular element of ARGV is empty (""), gawk skips it.

For each line in the input,  gawk tests to see if it matches any pattern in
the AWK  program.  It  tests the  patterns in the  order they occur  in the
program.   For each  pattern  that the  line matches,  awk executes  action
associtaed with that pattern.

Finally, after all the input is  exhausted, gawk executes the code in every
END block.

Variables and Fields

AWK variables  are dynamic:  they come into  existence when they  are first
used.  Their values are floating-point numbers, strings, or both, depending
upon  how they  are used.   AWK also has  one dimensional  arrays: multiply
dimensioned arrays can be simulated.  Several pre-defined variables are set
as a program runs; these are described as needed and summarized below.

Fields

As  it reads  a line  of  input, gawk  splits that  line into  fields.  The
variable FS defines how fields are separated:

-> If FS is a single character, fields are separated by that character.

-> If FS is longer than one character, it must be a regular expression.  In
   this  case, the  value  of variable  IGNORECASE  (described below)  also
   affects how fields are split.  FS is a regular expression.

-> In the  special case  that FS  is a single  space character,  fields are
   separated by a number of space characters or tab characters.

If variable  FIELDWIDTHS is set to a space-separated  list of numbers, each
field is  expected to have a  fixed width: gawk splits  up the record using
the specified widths, and ignores the value of FS. Assigning a new value to
FS overrides the use of FIELDWIDTHS, and restores the default behavior.

Each field in the input line can be referenced by its position: $1, $2, and
so on.  $0 is the whole line.

The  value of  a field  may be  assigned to  as well.   Fields need  not be
referenced by constants.  For example, the AWK expression

    n = 5
    print $n

prints the fifth field in the  input line.  The variable NF holds the total
number of fields in the input line.

References to non-existent fields (i.e., fields after $NF) produce the null
string.   However, assigning  to a  nonexistent field  (e.g., $(NF+2)  = 5)
increases the  value of NF;  creates any intervening fields,  with the null
string as the  value of each; and causes the  value of $0 to be recomputed,
with the fields being separated by the value of OFS.

Built-in Variables

The following variables are built into AWK:

ARGC The number of command-line arguments.  Note that this does not include
     options to gawk, or the program source.

ARGIND
     The index in ARGV of the file now being processed.

ARGV Array of command-line arguments.  The array is indexed from through to
     ARGC minus one.  Dynamically changing the contents of ARGV can control
     the files used for data.

CONVFMT
     The conversion format for numbers -- by default, ``%.6g''.

ENVIRON
     An array containing the  values of the current environment.  The array
     is indexed by the  environment variables, each element being the value
     of that variable  (e.g., ENVIRON["HOME"] might be /u/arnold). Changing
     this array does not affect the environment seen by programs which gawk
     spawns via redirection or the function system(). (This may change in a
     future version of gawk.)

ERRNO
     If a  system error occurs while  performing redirection for getline(),
     during  a read  for getline(),  or  during a  close, ERRNO  contains a
     string describing the error.

FIELDWIDTHS
     A white-space  separated list of  fieldwidths.  When set,  gawk parses
     the input  into fields of fixed  width, instead of using  the value of
     the  variable  FS  as  the  field  separator.  The  fixed  field-width
     facility is still experimental; expect the semantics to change as gawk
     evolves over time.

FILENAME
     The name of the current input  file.  If no files are specified on the
     command  line, the  value of  FILENAME is  `-'.  However,  FILENAME is
     undefined within the BEGIN block.

FNR  The number  of the record  within the current  input file that  is now
     being processed.

FS   The input field separator.  By default, this is a blank.

IGNORECASE
     Tell  gawk's pattern-matching  features to ignore  the case  when they
     compare  text with  a pattern.   When IGNORECASE is  set to  a nonzero
     function, the following features of gawk are affected:

     -> Pattern-matching within rules

     -> Fieldsplitting with FS.

     -> Regular expression matching with `~' and ``!~''.

     -> The operation  of the  pre-defined gawk functions  gsub(), index(),
        match(), split(), and sub().

     Thus, if IGNORECASE is not equal to zero, pattern

         /aB/

     matches all of the following:

         ab
         aB
         Ab
         AB

     As with all AWK variables, the initial value of IGNORECASE is zero, so
     all regular expression operations are normally case-sensitive.

NF   The number of fields in the current input record.

NR   The total number of input records seen so far.

OFMT The output format for numbers -- by default ``%.6g''.

OFS  The output-field separator -- by default a space character.

ORS  The output-record separator -- by default a newline.

RS   The input record separator -- by default a newline.  RS is exceptional
     in  that only  the first  character  of its  string value  is used  to
     separate records.   (This will probably change in  a future release of
     gawk.) If RS is set to  the null string, then records are separated by
     blank lines.   When RS  is set  to the null  string, then  the newline
     character always  acts as a  field separator, in  addition to whatever
     value FS may have.

RSTART
     The index of the first character matched by the gawk function match():
     zero if no match.

RLENGTH
     The length of the string matched by match(): -1 if no match.

SUBSEP
     The character  used to separate multiple  subscripts in array elements
     -- by default `` 34''.

Arrays

Arrays are subscripted with  an expression between square brackets (`[' and
`]').  If the expression is an expression

    list (expr, expr ...)

then the array subscript is a string consisting of the concatenation of the
(string) value of each expression,  separated by the value of the variablen
SUBSEP. This facility simulates multi-dimensional arrays.  For example,

    i = "A" ; j = "B" ; k = "C"
    x[i, j, k] = "hello, world\n"

assigns the string

    "hello, world\n"

to the element of the array x which is indexed by the string

    "A\034B\034C".

All arrays in AWK are associative, i.e., indexed by string values.

The special operator  in may be used in an  if or while statement to see if
an array has an index that consists of a particular value:

    if (val in array)
    print array[val]

If the array has multiple subscripts, use (i, j) in array.

You can also use the construct  in within a for loop to iterate through all
the elements of an array.

An element can be deleted from an array using the statement delete.

Variable Typing And Conversion

Variables and fields can  be floating-point numbers, strings, or both.  How
the value  of a  variable is  interpreted depends upon  its context.   If a
variable or  field is  used in  a numeric expression,  gawk treats it  as a
number;  if used  as a  string, gawk  treats it  as a  string.  To  force a
variable to  be treated  as a  number, add zero  to it;  to force it  to be
treated as a string, concatenate it with the null string.

When a string must be converted to a number, the conversion is accomplished
by the library function atof(). A  number is converted to a string by using
the value  of CONVFMT as  a format string  for sprintf(), with  the numeric
value of the variable as the argument.  However, even though all numbers in
AWK are  floating point, integral values are  always converted as integers.
Thus, given

    CONVFMT = "%2.2f"
    a = 12
    b = a ""

the variable b has a value of 12, not 12.00.

gawk performs comparisons as follows:

-> If two variables are numeric, they are compared numerically.

-> If  one value is  numeric and  the other  has a string  value that  is a
   ``numeric string,'' then comparisons are also done numerically.

-> Otherwise,  the numeric  value is  converted  to a  string and  a string
   comparison is performed.

Two strings  are compared, of  course, as strings.  According  to the POSIX
Standard, even if two strings  are numeric strings, a numeric comparison is
performed; however, this is clearly incorrect, and gawk does not do this.

Uninitialized variables have the numeric value zero and the string value ""
(the null, or empty, string).

Patterns and Actions

AWK  is a  line-oriented language:  the pattern comes  first, and  then the
action.  Action statements are enclosed in `{' and `}'.  Either the pattern
may be missing, or the action may be missing, but (of course) not both.  If
the pattern is missing, AWK executes the action for every line of input.  A
missing action is equivalent to

    { print }

which prints the entire line.

Comments begin with the character `#', and continue to the end of the line.
Blank lines can be used to separate statements.  Normally, a statement ends
with a  newline; however, this is  not the case for lines  ending in any of
the following characters:

    ,   {   ?   :   &&  ||

Lines  that  end in  one  of  the above  characters  have their  statements
automatically continued on the following  line.  In other cases, a line can
be continued  by ending it  with a `\',  in which case the  newline will be
ignored.

Multiple statements may  be put on one line by  separating them with a `;'.
This  applies  to  both  the   statements  within  the  action  part  of  a
pattern/action pair (the  usual case), and to the pattern/action statements
themselves.

Patterns

AWK patterns may be one of the following:

    BEGIN
    END
    /regular expression/
    relational expression
    pattern && pattern
    pattern || pattern
    pattern ? pattern : pattern
    (pattern)
    ! pattern
    pattern1, pattern2

BEGIN and  END are  two special  patterns that are  not tested  against the
input.  The  action parts of  all BEGIN patterns  are merged as  if all the
statements had  been written  in a single  BEGIN block.  They  are executed
before  any of  the  input is  read.   Likewise, gawk  merges  all the  END
patterns and executes them when all the input is exhausted (or when an exit
statement is  executed).  BEGIN  and END  patterns cannot be  combined with
other patterns  in pattern expressions.   BEGIN and END  patterns must have
action parts.

For

    /regular expression/

patterns, the  associated statement  is executed  for each input  line that
matches the regular expression.   Regular expressions are the same as those
described in the Lexicon entry for the shell sh, and are summarized below.

A relational expression  may use any of the operators  defined below in the
section  on actions.   These generally  test  whether certain  fields match
certain regular expressions.

The  operators &amp;&amp;,  ||,  and !  are  logical AND,  logical OR,  and
logical NOT, respectively, as in C.  They do short-circuit evaluation, also
as in C, and are used for combining more primitive pattern expressions.  As
in  most  languages,  parentheses  may  be  used to  change  the  order  of
evaluation.

The operator  ?: is like the  same operator in C.  If  the first pattern is
true then the pattern used for  testing is the second pattern, otherwise it
is the third.  Only one of the second and third patterns is evaluated.

The

    pattern1, pattern2

form of an expression is called  a ``range pattern''.  It matches all input
records starting with a line  that matches pattern1, and continues until it
reads a record that matches  pattern2, inclusive.  It does not combine with
any other sort of pattern expression.

Regular Expressions

Regular expressions are  the extended kind found in the  shell sh. They are
composed of characters, as follows:

c    Match the non-meta-character c.

\c   Match the literal character c.

.    Match any character except newline.

^    Match the beginning of a line or a string.

$    Match the end of a line or a string.

[abc...]
     Character class: Match any of the characters abc....

[^abc...]
     Negated  character  class:  Match  any  character  except  abc...  and
     newline.

r1|r2
     Alternation: match either r1 or r2.

r1r2 Concatenation: Match r1, then r2.

r+   Match one or more r's.

r*   Match zero or more r's.

r?   Match zero or one r's.

(r)  Grouping: match r.

The escape  sequences that  are valid in  string constants (see  below) are
also legal in regular expressions.

Actions

Action statements  are enclosed in braces, `{'  and `}'.  Action statements
consist of the  usual assignment, conditional, and looping statements found
in  most languages.   The operators,  control statements,  and input/output
statements available are patterned after those in C.

Operators

The following gives AWK's operators, in order of increasing precedence:

    = += -=
    *= /= %= ^= = (assignment)

Both absolute assignment (var = value)  and operator-assignment  (the other
forms) are supported.

This has the form

    expr1 ? expr2 : expr3

     If expr1 is  true, the value of the expression  is expr2; otherwise it
     is expr3. Only one of expr2 and expr3 is evaluated.

|| -- logical OR
&amp;&amp; -- logical AND
~ -- Regular expression match
!~ -- Negated match
     Do not use a constant regular expression (/foo/) on the left-hand side
     of  a  `~'  or `!~'.   Only  use  one  on  the right-hand  side.   The
     expression

         /foo/ ~ exp

     has the same meaning as:

         (($0 ~ /foo/) ~ exp)

     This is usually not what was intended.

< >
<= >=
!=
==   The regular relational operators.

<blank>
     String concatenation.

+
-    Addition and subtraction.

*
/
%    Multiplication, division, and modulus.

+
-
!    Unary plus, unary minus, and logical negation.

^    Exponentiation.  The operator `**' may also be used, and `**=' for the
     assignment operator.

++
--   Increment and decrement, both prefix and suffix.

$    Field reference.

Control Statements

The control statements are as follows:

    if (condition) statement [ else statement ]
    while (condition) statement
    do statement while (condition)
    for (expr1; expr2; expr3) statement
    for (var in array) statement
    break
    continue
    delete array[index]
    exit [ expression ]
    { statements }

I/O Statements

AWK recognizes the following input/output statements:

close(filename)
     Close file or pipe.

getline
     Set $0 from next input  record.  This statement also sets the built-in
     variables NF, NR, and FNR.

getline <file
     Set $0 from next record of  file.  This statement also sets the built-
     in variable NF.

getline var
     Set var from next input  record.  This statment also sets the built-in
     variables NF and FNR.

getline var <file
     Set var from next record of file.

next Stop processing  the current input  record.  The next  input record is
     read  and processing  starts over  with the first  pattern in  the AWK
     program.  If the  end of the input data is  reached, each END block is
     executed.

next file
     Stop processing  the current input  file.  The next  input record read
     comes from the next input file.   FILENAME is updated, FNR is reset to
     one,  and processing  starts over  with the first  pattern in  the AWK
     program.   If the  end of  the  input data  is reached,  every END  is
     executed.

print
     Print the current record.

print expr-list
     Print each expression in expr-list.

print expr-list >file
     Print expressions on file.

printf fmt, expr-list
     Format and print.

printf fmt, expr-list >file
     Format and print into file.

system(cmd-line)
     Execute the command cmd-line, and return its exit status.

Other input/output  redirections are also  allowed.  For print  and printf,
>>file appends output onto file,  whereas a `|' command writes onto a
pipe.  Likewise, command  |getline pipes into getline. getline returns zero
when it reads EOF, and -1 if an error occurs.

<i>The printf Statement

The AWK statement printf and  the function sprintf() (see below) accept the
following conversion specification formats:

%c   An ASCII  character.  If the  argument used for  %c is numeric,  it is
     treated  as  a  character and  printed.   Otherwise,  the argument  is
     assumed to be a string, and the only first character of that string is
     printed.

%d   A decimal number (the integer part).

%i   Just like %d.

%e   A floating-point number of the form [-]d.ddddddE[+-]dd.

%f   A floating-point number of the form [-]ddd.dddddd.

%g   Use `e'  or `f' conversion, whichever  is shorter, with nonsignificant
     zeros suppressed.

%o   An unsigned octal number (again, an integer).

%s   A character string.

%x   An unsigned hexadecimal number (an integer).

%X   Like %x, but using ``ABCDEF'' instead of ``abcdef''.

%%   A single `%' character; no argument is converted.

There are optional, additional parameters  that may lie between the `%' and
the control letter:

-    The expression should be left-justified within its field.

width
     The field should be padded to this width.  If the number has a leading
     zero,  then the  field will  be padded with  zeroes; otherwise,  it is
     padded with blanks.

.prec
     A number  that indicates the maximum  width of the string  or digit to
     the right of the decimal point.

The  dynamic  width  and precision  capabilities  of  the  ANSI C  printf()
routines are  supported.  A `*' in  place of either the  width or precision
specification causes AWK to take its value from the argument list to printf
or sprintf().

Special File Names

When doing I/O redirection from either  print or printf into a file, or via
getline from a file, gawk recognizes certain special file names internally.
These  file names  allow  access to  open file  descriptors inherited  from
gawk's  parent  process  (usually the  shell).   Other  special file  names
provide access information about  the running gawk process.  The file names
are as follows:

/dev/pid
     Reading this  file returns  the identfier  of the current  process, in
     decimal, terminated with a newline.

/dev/ppid
     Reading this  file returns the  identifier of the  current's process's
     parent, in decimal, terminated with a newline.

/dev/pgrpid
     Reading this  file returns the current  process's group identifier, in
     decimal, terminated with a newline.

/dev/user
     Reading this  file returns a single record  terminated with a newline.
     The fields are  separated with blanks.  $1 is the  value of the system
     call getuid();  $2 is the value  of the system call  geteuid() ; $3 is
     the value  of the  system call  getgid(); and $4  is the value  of the
     system call  getegid(). If there  are any additional  fields, they are
     the group identifiers returned by getgroups().

/dev/stdin
     The standard input.

/dev/stdout
     The standard output.

/dev/stderr
     The standard error output.

/dev/fd/n
     The file associated with the open-file descriptor n.

These are particularly useful for error messages.  For example, these files
let you use the statement

    print "You blew it!" > "/dev/stderr"

where otherwise you would have had to say:

    print "You blew it!" | "cat 1>&2"

These file names may also be used on the command line to name data files.

Numeric Functions

AWK contains the following pre-defined arithmetic functions:

atan2(y, x)
     Return the arctangent of y/x, in radians.

cos(expr)
     Returns the cosine, in radians.

exp(expr)
     The exponential function.

int(expr)
     Truncate to integer.

log(expr)
     The natural-logarithm function.

rand()
     Returns a random number between zero and one.

sin(expr)
     Return the sine in radians.

sqrt(expr)
     The square-root function.

srand(expr)
     Use expr as a new seed for the random number generator.  If no expr is
     provided,  the time  of day  will be  used.  The  return value  is the
     previous seed for the random number generator.

String Functions

AWK has the following pre-defined string functions:

gsub(r, s, t)
     For each substring matching the  regular expression r in the string t,
     substitute the string s and  return the number of substitutions.  If t
     is not supplied, use $0.

index(s, t)
     Return the index of the string  t in the string s, or zero if t is not
     present.

length(s)
     Return the  length of the  string s, or the  length of $0 if  s is not
     supplied.

match(s, r)
     Return the  position in  s where the  regular expression r  occurs, or
     zero if r is not present, and set the values of RSTART and RLENGTH.

split(s, a, r)
     Split the string  s into the array a on  the regular expression r, and
     return the number of fields.  If r is omitted, use FS instead.

sprintf(fmt, expr-list)
     Print expr-list according to fmt, and return the resulting string.

sub(r, s, t)
     Just like gsub(), but only the first matching substring is replaced.

substr(s, i, n)
     Return the n-character substring of s  starting at i. If n is omitted,
     the rest of s is used.

tolower(str)
     Return a copy of the string str, with all the upper-case characters in
     str translated  to their corresponding  lower-case counterparts.  Non-
     alphabetic characters are left unchanged.

toupper(str)
     Return a copy of the string str, with all the lower-case characters in
     str translated  to their corresponding  upper-case counterparts.  Non-
     alphabetic characters are left unchanged.

Time Functions

Because one  of the primary  uses of AWK  programs is processing  log files
that  contain  time  stamp information,  gawk  provides  the following  two
functions for obtaining time stamps and formatting them.

systime()
     Return the current time of day as the number of seconds since 00:00:00
     hours on January 1, 1970 GMT.

strftime(format, timestamp)
     Format  timestamp  according   to  the  specification  within  format.
     timestamp  should be  of the  same form as  returned by  systime(). If
     timestamp  is missing,  the  current time  of  day is  used.  See  the
     Lexicon  entry for  strftime()  for the  format  conversions that  are
     guaranteed to be available.

String Constants

String  constants  in  AWK are  sequences  of  characters enclosed  between
quotation marks  `"'.  Within a string, the  following escape sequences are
recognized:

    \\  Literal backslash
    \a  The BEL character
    \b  Backspace
    \f  Form-feed
    \n  New line
    \r  Carriage return
    \t  Horizontal tab
    \v  vertical tab.
    \xXX    Character with hexadecimal value XX
    \OOO    Character represented by octal digits OOO
    \c  The literal character c

The escape  sequences may also be used  within constant regular expressions
(e.g., /[\t\f\n\r\v]/ matches whitespace characters).

Functions

AWK defines a function as follows:

    function name(parameter list) { statements }

AWK executes a function when it  is called from within the action part of a
regular pattern/action statement.   The parameters supplied in the function
call  are used  to instantiate  the formal  parameters declared  within the
function.  Arrays  are passed by  reference, other variables  are passed by
value.

Because  functions  were  not originally  part  of  the  AWK language,  the
provision for local variables is  rather clumsy: they are declared as extra
parameters  in the  parameter list.   The convention  is to  separate local
variables from real parameters by  extra spaces in the parameter list.  For
example:

    function f(p, q, a, b) { # a & b are local
        .....
    }

    /abc/ { ...
    ; f(1, 2) ; ...
    }

The left  parenthesis in a function call is  required to immediately follow
the function name, without any intervening white space.  This is to avoid a
syntactic ambiguity with the concatenation operator.  This restriction does
not apply to the built-in functions listed above.

Functions may  call each other  and may be  recursive.  Function parameters
used as local  variables are initialized to the null  string and the number
zero upon function invocation.

The word func may be used in place of function.

Examples

Print and sort the login names of every user on your system:

    BEGIN { FS = ":" }
    { print $1 | "sort" }

Count lines in a file:

    { nlines++ }
    END { print nlines }

Precede each line by its number in the file:

    { print FNR, $0 }

Concatenate and line number (a variation on a theme):

    { print NR, $0 }

Compatibility

A primary goal  for gawk is compatibility with the  POSIX Standard, as well
as with the latest version of  UNIX awk. To this end, gawk incorporates the
following user-visible features that are not described in the AWK book, but
are part of awk in System V Release 4, and are in the POSIX Standard:

-> The option -v for assigning variables before program execution starts is
   new.  The  book indicates that command  line variable assignment happens
   when awk would otherwise open the argument as a file, which is after the
   BEGIN block is executed.  However, in earlier implementations, when such
   an  assignment appeared  before  any file  names,  the assignment  would
   happen before  the BEGIN block was run.  Applications  came to depend on
   this ``feature.'' When awk  was changed to match its documentation, this
   option was  added to accomodate applications that  depended upon the old
   behavior.  (This  feature was agreed  upon by both the  AT&T and GNU
   developers.)

-> The option  -W for  implementation specific  features is from  the POSIX
   Standard.

-> When processing arguments, gawk uses the special option ``--'' to signal
   the end of arguments,  and warns about, but otherwise ignores, undefined
   options.

-> The AWK book  does not define the return value  of srand(). The System V
   Release 4 version  of UNIX awk (and the POSIX  Standard standard) has it
   return the  seed it was using,  to allow keeping track  of random number
   sequences.  Therefore, srand() in gawk also returns its current seed.

-> Other new  features include  the following:  use of multiple  -f options
   (from MKS awk); the ENVIRON array;  the escape sequences \a and \v (done
   originally in gawk and fed back into AT&T's); the built-in functions
   tolower()  and  toupper() (from  AT&T);  and  the ANSI-C  conversion
   specifications in printf (done first in AT&T's version).

GNU Extensions

gawk has some extensions to POSIX  Standard awk. They are described in this
section.  All  the extensions  described here  can be disabled  by invoking
gawk with the command-line option -W compat. The following features of gawk
are not available in POSIX Standard awk:

-> The escape sequence \x.

-> The functions systime() and strftime().

-> The special-file names available for I/O redirection.

-> The variables ARGIND and ERRNO are not special.

-> The variable IGNORECASE and its side-effects are not available.

-> The variable FIELDWIDTHS and fixed-width field splitting.

-> No  path  search  is  performed  for  files named  via  the  option  -f.
   Therefore, the environmental variable AWKPATH is not special.

-> The use of next file to abandon processing of the current input file.

The AWK  book does  not define  the return value  of the  function close().
gawk's close()  returns the value from fclose() or  pclose() when closing a
file  or pipe,  respectively.   When gawk  is  invoked with  the option  -W
compat, if the fs argument to  option -F is `t', then FS will be set to the
tab character.  Because  this is a rather ugly special  case, it is not the
default behavior.   This behavior also  does not occur if  -Wposix has been
specified.

Historical Features

There  are  two  features  of  historical  AWK  implementations  that  gawk
supports.  First, it is possible to call the length() built-in function not
only with no argument, but even without parentheses!  Thus

    a = length

is the same as either of

    a = length()
    a = length($0)

This feature  is marked as  ``deprecated'' in the  POSIX Standard standard,
and gawk will  issue a warning about its use  if option -Wlint is specified
on the command line.

The other feature is the use  of the continue statement outside the body of
a while,  for, or  do loop.   Traditional AWK implementations  have treated
such usage  as equivalent to the next statement.   gawk supports this usage
if -Wposix has not been specified.

See Also

awk,
commands,
Programming COHERENT
Introduction to the awk Language, tutorial.
Aho,  Alfred  V.;  Kernighan,  Brian  W.;  Weinberger, Peter  J.:  The  AWK
Programming  Language.  Englewood Cliffs,  NJ,  Addison-Wesley, Inc.,  1988
(ISBN 0-201-07981-X).
The GAWK Manual, ed 0.15.  Boston, The Free Software Foundation, 1993.

Notes

The  option -F  option is  not  necessary given  the command  line variable
assignment feature; it remains only for backwards compatibility.

If  your  system  actually  has  support  for /dev/fd  and  the  associated
/dev/stdin,  /dev/stdout,  and /dev/stderr  files,  you  may get  different
output from gawk than you would  get on a system without those files.  When
gawk  interprets these  files  internally, it  synchronizes  output to  the
standard output  with output to  /dev/stdout, while on a  system with those
files, the output is actually to different open files.  Caveat utilitor.

This man  page documents  gawk, version 2.15.   Please note that  with this
version, gawk no longer recognizes the command-line options -c, -V, -C, -a,
and -e that had been recognized by version 2.11.

The original  version of  UNIX awk was  designed and implemented  by Alfred
Aho, Peter  Weinberger, and Brian Kernighan  of AT&T Bell Laboratories.
Brian Kernighan continues to maintain and enhance it.

Paul Rubin and Jay Fenlason, of the Free Software Foundation, wrote gawk to
be compatible with the original  version of awk distributed in UNIX version
7.   John Woods  contributed a  number of bug  fixes.  David  Trueman, with
contributions  from  Arnold  Robbins, made  gawk  compatible  with the  new
version of UNIX awk.

Brian Kernighan of  AT&T Bell Laboratories provided valuable assistance
during testing and debugging.  The authors thank him.

Finally, please note  that gawk and its associated documentation (including
this  manual   page)  is  protected  by   the  Free  Software  Foundation's
``copyleft''.  For  details on  your rights  and obligations, see  the file
COPYING in  the source code for  gawk, which is available  through the Mark
Williams BBS and other public-domain systems.