COHERENT manpages
This page displays the COHERENT manpage for gawk [Pattern-scanning and -processing language].
List of available manpages
Index
gawk -- Command
Pattern-scanning and -processing language
gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
gawk [ POSIX or GNU style options ] [ -- ] program-text file ...
gawk is the GNU Project's implementation of the AWK programming language.
It conforms to the definition of the language in the POSIX Standard 1003.2
Command Language and Utilities Standard. This version in turn is based on
the description in The AWK Programming Language, by Aho, Kernighan, and
Weinberger, with the additional features defined in the System V Release 4
version of awk. gawk also provides some GNU-specific extensions.
The command line consists of options to gawk itself, the AWK program text
(if not supplied via the options -f or --file), and values to be made
available in the predefined AWK variables ARGC and ARGV.
Command-line Options
gawk options may be either the traditional POSIX one-letter options, or the
GNU style long options. POSIX Standard-style options begin with a single
`-', whereas GNU long options begin with ``--''. GNU-style long options
are provided for both GNU-specific features and for POSIX mandated
features. Other implementations of the AWK language are likely to only
accept the traditional one-letter options.
Following the POSIX Standard, gawk-specific options are supplied via
arguments to the -W option. Multiple -W options may be supplied, or
multiple arguments may be supplied together if they are separated by
commas, or enclosed in quotation marks and separated by white space. Case
is ignored in arguments to the -W option. Each -W option has a
corresponding GNU style long option, as detailed below.
gawk recognizes the following command-line options:
-F fs
--field-separator=fs
Use fs for the input field separator (the value of the predefined
variable FS).
-v variable=value
--assign=variable=value
Assign value to variable before executing the program. value is
available to the BEGIN block of an AWK program.
-f program-file
--file=program-file
Read the AWK program's source from file program-file, instead of from
the first command-line argument. The awk command line can contain
more than one -f or --file options.
-W compat
--compat
Run in compatibility mode. In compatibility mode, gawk behaves
identically to UNIX awk; it recognizes none of the GNU-specific
extensions are recognized. These extensions are described below.
-W copyleft
-W copyright
--copyleft
--copyright
Print the short version of the GNU copyright information message on
the standard error.
-W help
-W usage
--help
--usage
Print a relatively short summary of the available options on the
standard error.
-W lint
--lint
Provide warnings about constructs that are dubious or non-portable to
other implementations of AWK.
-W posix
--posix
This turns on compatibility mode, with the following additional
restrictions:
-> The `\x' escape sequences are not recognized.
-> The synonym func for the keyword function is not recognized.
-> The operators ``**'' and ``**='' cannot be used in place of `^' and
``^=''.
-W source=program-text
--source=program-text
Use program-text as the AWK program's source code. This option allows
the easy intermixing of library functions (used via the options -f and
--file) with source code entered on the command line. It is intended
primarily for medium to large AWK programs used in shell scripts. The
-W source= form of this option uses the rest of the command line
argument for program-text; no other options to -W will be recognized
in the same argument.
-W version
--version
Print version information for this particular copy of gawk on the
standard error. This is useful mainly for knowing if your copy of
gawk is up to date with what the Free Software Foundation is
distributing.
-- Signal the end of options. This is useful to allow further arguments
to the AWK program itself to start with a `-'. This is mainly for
consistency with the argument parsing convention used by most other
POSIX Standard programs.
All other options are flagged as illegal and ignored.
AWK Program Execution
An AWK program consists of a sequence of pattern/action statements, plus
optional function definitions:
pattern { action statements }
function name(parameter list) { statements }
gawk first reads the program source from the program file (or files) if
specified, or from the first non-option argument on the command line. The
option -f may be used multiple times on the command line. gawk reads the
program text as if all the program-files had been concatenated. This is
useful for building libraries of AWK functions, without having to include
them in each new AWK program that uses them. To use a library function in
a file from a program typed in on the command line, specify /dev/tty as one
of the program files, type your program, and end it with a <ctrl-D>.
The environment variable AWKPATH specifies a search path to use when
finding source files named with the option -f. If this variable does not
exist, the default path is:
.:/usr/lib/awk:/usr/local/lib/awk
If a file name given to the -f option contains a `/' character, gawk does
not perform a path search.
gawk executes AWK programs in the following order:
1. gawk compiles the program into an internal form.
2. All variable assignments specified via the -v option are performed.
3. gawk executes the code in the BEGIN block (or blocks), should there be
any.
4. gawk then proceeds to read each file named in the ARGV array. If no
files are named on the command line, gawk reads the standard input.
If a file name on the command line has the form variable=value, gawk treats
it as a variable assignment, and assigns value to variable. (This happens
after every BEGIN block has been run.) Command-line assignment of variables
is most useful when you wish to assign values dynamically to the variables
AWK uses to control how input is broken into fields and records. It is
also useful for controlling the state of program execution if multiple
passes are needed over a single data file.
If the value of a particular element of ARGV is empty (""), gawk skips it.
For each line in the input, gawk tests to see if it matches any pattern in
the AWK program. It tests the patterns in the order they occur in the
program. For each pattern that the line matches, awk executes action
associtaed with that pattern.
Finally, after all the input is exhausted, gawk executes the code in every
END block.
Variables and Fields
AWK variables are dynamic: they come into existence when they are first
used. Their values are floating-point numbers, strings, or both, depending
upon how they are used. AWK also has one dimensional arrays: multiply
dimensioned arrays can be simulated. Several pre-defined variables are set
as a program runs; these are described as needed and summarized below.
Fields
As it reads a line of input, gawk splits that line into fields. The
variable FS defines how fields are separated:
-> If FS is a single character, fields are separated by that character.
-> If FS is longer than one character, it must be a regular expression. In
this case, the value of variable IGNORECASE (described below) also
affects how fields are split. FS is a regular expression.
-> In the special case that FS is a single space character, fields are
separated by a number of space characters or tab characters.
If variable FIELDWIDTHS is set to a space-separated list of numbers, each
field is expected to have a fixed width: gawk splits up the record using
the specified widths, and ignores the value of FS. Assigning a new value to
FS overrides the use of FIELDWIDTHS, and restores the default behavior.
Each field in the input line can be referenced by its position: $1, $2, and
so on. $0 is the whole line.
The value of a field may be assigned to as well. Fields need not be
referenced by constants. For example, the AWK expression
n = 5
print $n
prints the fifth field in the input line. The variable NF holds the total
number of fields in the input line.
References to non-existent fields (i.e., fields after $NF) produce the null
string. However, assigning to a nonexistent field (e.g., $(NF+2) = 5)
increases the value of NF; creates any intervening fields, with the null
string as the value of each; and causes the value of $0 to be recomputed,
with the fields being separated by the value of OFS.
Built-in Variables
The following variables are built into AWK:
ARGC The number of command-line arguments. Note that this does not include
options to gawk, or the program source.
ARGIND
The index in ARGV of the file now being processed.
ARGV Array of command-line arguments. The array is indexed from through to
ARGC minus one. Dynamically changing the contents of ARGV can control
the files used for data.
CONVFMT
The conversion format for numbers -- by default, ``%.6g''.
ENVIRON
An array containing the values of the current environment. The array
is indexed by the environment variables, each element being the value
of that variable (e.g., ENVIRON["HOME"] might be /u/arnold). Changing
this array does not affect the environment seen by programs which gawk
spawns via redirection or the function system(). (This may change in a
future version of gawk.)
ERRNO
If a system error occurs while performing redirection for getline(),
during a read for getline(), or during a close, ERRNO contains a
string describing the error.
FIELDWIDTHS
A white-space separated list of fieldwidths. When set, gawk parses
the input into fields of fixed width, instead of using the value of
the variable FS as the field separator. The fixed field-width
facility is still experimental; expect the semantics to change as gawk
evolves over time.
FILENAME
The name of the current input file. If no files are specified on the
command line, the value of FILENAME is `-'. However, FILENAME is
undefined within the BEGIN block.
FNR The number of the record within the current input file that is now
being processed.
FS The input field separator. By default, this is a blank.
IGNORECASE
Tell gawk's pattern-matching features to ignore the case when they
compare text with a pattern. When IGNORECASE is set to a nonzero
function, the following features of gawk are affected:
-> Pattern-matching within rules
-> Fieldsplitting with FS.
-> Regular expression matching with `~' and ``!~''.
-> The operation of the pre-defined gawk functions gsub(), index(),
match(), split(), and sub().
Thus, if IGNORECASE is not equal to zero, pattern
/aB/
matches all of the following:
ab
aB
Ab
AB
As with all AWK variables, the initial value of IGNORECASE is zero, so
all regular expression operations are normally case-sensitive.
NF The number of fields in the current input record.
NR The total number of input records seen so far.
OFMT The output format for numbers -- by default ``%.6g''.
OFS The output-field separator -- by default a space character.
ORS The output-record separator -- by default a newline.
RS The input record separator -- by default a newline. RS is exceptional
in that only the first character of its string value is used to
separate records. (This will probably change in a future release of
gawk.) If RS is set to the null string, then records are separated by
blank lines. When RS is set to the null string, then the newline
character always acts as a field separator, in addition to whatever
value FS may have.
RSTART
The index of the first character matched by the gawk function match():
zero if no match.
RLENGTH
The length of the string matched by match(): -1 if no match.
SUBSEP
The character used to separate multiple subscripts in array elements
-- by default `` 34''.
Arrays
Arrays are subscripted with an expression between square brackets (`[' and
`]'). If the expression is an expression
list (expr, expr ...)
then the array subscript is a string consisting of the concatenation of the
(string) value of each expression, separated by the value of the variablen
SUBSEP. This facility simulates multi-dimensional arrays. For example,
i = "A" ; j = "B" ; k = "C"
x[i, j, k] = "hello, world\n"
assigns the string
"hello, world\n"
to the element of the array x which is indexed by the string
"A\034B\034C".
All arrays in AWK are associative, i.e., indexed by string values.
The special operator in may be used in an if or while statement to see if
an array has an index that consists of a particular value:
if (val in array)
print array[val]
If the array has multiple subscripts, use (i, j) in array.
You can also use the construct in within a for loop to iterate through all
the elements of an array.
An element can be deleted from an array using the statement delete.
Variable Typing And Conversion
Variables and fields can be floating-point numbers, strings, or both. How
the value of a variable is interpreted depends upon its context. If a
variable or field is used in a numeric expression, gawk treats it as a
number; if used as a string, gawk treats it as a string. To force a
variable to be treated as a number, add zero to it; to force it to be
treated as a string, concatenate it with the null string.
When a string must be converted to a number, the conversion is accomplished
by the library function atof(). A number is converted to a string by using
the value of CONVFMT as a format string for sprintf(), with the numeric
value of the variable as the argument. However, even though all numbers in
AWK are floating point, integral values are always converted as integers.
Thus, given
CONVFMT = "%2.2f"
a = 12
b = a ""
the variable b has a value of 12, not 12.00.
gawk performs comparisons as follows:
-> If two variables are numeric, they are compared numerically.
-> If one value is numeric and the other has a string value that is a
``numeric string,'' then comparisons are also done numerically.
-> Otherwise, the numeric value is converted to a string and a string
comparison is performed.
Two strings are compared, of course, as strings. According to the POSIX
Standard, even if two strings are numeric strings, a numeric comparison is
performed; however, this is clearly incorrect, and gawk does not do this.
Uninitialized variables have the numeric value zero and the string value ""
(the null, or empty, string).
Patterns and Actions
AWK is a line-oriented language: the pattern comes first, and then the
action. Action statements are enclosed in `{' and `}'. Either the pattern
may be missing, or the action may be missing, but (of course) not both. If
the pattern is missing, AWK executes the action for every line of input. A
missing action is equivalent to
{ print }
which prints the entire line.
Comments begin with the character `#', and continue to the end of the line.
Blank lines can be used to separate statements. Normally, a statement ends
with a newline; however, this is not the case for lines ending in any of
the following characters:
, { ? : && ||
Lines that end in one of the above characters have their statements
automatically continued on the following line. In other cases, a line can
be continued by ending it with a `\', in which case the newline will be
ignored.
Multiple statements may be put on one line by separating them with a `;'.
This applies to both the statements within the action part of a
pattern/action pair (the usual case), and to the pattern/action statements
themselves.
Patterns
AWK patterns may be one of the following:
BEGIN
END
/regular expression/
relational expression
pattern && pattern
pattern || pattern
pattern ? pattern : pattern
(pattern)
! pattern
pattern1, pattern2
BEGIN and END are two special patterns that are not tested against the
input. The action parts of all BEGIN patterns are merged as if all the
statements had been written in a single BEGIN block. They are executed
before any of the input is read. Likewise, gawk merges all the END
patterns and executes them when all the input is exhausted (or when an exit
statement is executed). BEGIN and END patterns cannot be combined with
other patterns in pattern expressions. BEGIN and END patterns must have
action parts.
For
/regular expression/
patterns, the associated statement is executed for each input line that
matches the regular expression. Regular expressions are the same as those
described in the Lexicon entry for the shell sh, and are summarized below.
A relational expression may use any of the operators defined below in the
section on actions. These generally test whether certain fields match
certain regular expressions.
The operators &&, ||, and ! are logical AND, logical OR, and
logical NOT, respectively, as in C. They do short-circuit evaluation, also
as in C, and are used for combining more primitive pattern expressions. As
in most languages, parentheses may be used to change the order of
evaluation.
The operator ?: is like the same operator in C. If the first pattern is
true then the pattern used for testing is the second pattern, otherwise it
is the third. Only one of the second and third patterns is evaluated.
The
pattern1, pattern2
form of an expression is called a ``range pattern''. It matches all input
records starting with a line that matches pattern1, and continues until it
reads a record that matches pattern2, inclusive. It does not combine with
any other sort of pattern expression.
Regular Expressions
Regular expressions are the extended kind found in the shell sh. They are
composed of characters, as follows:
c Match the non-meta-character c.
\c Match the literal character c.
. Match any character except newline.
^ Match the beginning of a line or a string.
$ Match the end of a line or a string.
[abc...]
Character class: Match any of the characters abc....
[^abc...]
Negated character class: Match any character except abc... and
newline.
r1|r2
Alternation: match either r1 or r2.
r1r2 Concatenation: Match r1, then r2.
r+ Match one or more r's.
r* Match zero or more r's.
r? Match zero or one r's.
(r) Grouping: match r.
The escape sequences that are valid in string constants (see below) are
also legal in regular expressions.
Actions
Action statements are enclosed in braces, `{' and `}'. Action statements
consist of the usual assignment, conditional, and looping statements found
in most languages. The operators, control statements, and input/output
statements available are patterned after those in C.
Operators
The following gives AWK's operators, in order of increasing precedence:
= += -=
*= /= %= ^= = (assignment)
Both absolute assignment (var = value) and operator-assignment (the other
forms) are supported.
This has the form
expr1 ? expr2 : expr3
If expr1 is true, the value of the expression is expr2; otherwise it
is expr3. Only one of expr2 and expr3 is evaluated.
|| -- logical OR
&& -- logical AND
~ -- Regular expression match
!~ -- Negated match
Do not use a constant regular expression (/foo/) on the left-hand side
of a `~' or `!~'. Only use one on the right-hand side. The
expression
/foo/ ~ exp
has the same meaning as:
(($0 ~ /foo/) ~ exp)
This is usually not what was intended.
< >
<= >=
!=
== The regular relational operators.
<blank>
String concatenation.
+
- Addition and subtraction.
*
/
% Multiplication, division, and modulus.
+
-
! Unary plus, unary minus, and logical negation.
^ Exponentiation. The operator `**' may also be used, and `**=' for the
assignment operator.
++
-- Increment and decrement, both prefix and suffix.
$ Field reference.
Control Statements
The control statements are as follows:
if (condition) statement [ else statement ]
while (condition) statement
do statement while (condition)
for (expr1; expr2; expr3) statement
for (var in array) statement
break
continue
delete array[index]
exit [ expression ]
{ statements }
I/O Statements
AWK recognizes the following input/output statements:
close(filename)
Close file or pipe.
getline
Set $0 from next input record. This statement also sets the built-in
variables NF, NR, and FNR.
getline <file
Set $0 from next record of file. This statement also sets the built-
in variable NF.
getline var
Set var from next input record. This statment also sets the built-in
variables NF and FNR.
getline var <file
Set var from next record of file.
next Stop processing the current input record. The next input record is
read and processing starts over with the first pattern in the AWK
program. If the end of the input data is reached, each END block is
executed.
next file
Stop processing the current input file. The next input record read
comes from the next input file. FILENAME is updated, FNR is reset to
one, and processing starts over with the first pattern in the AWK
program. If the end of the input data is reached, every END is
executed.
print
Print the current record.
print expr-list
Print each expression in expr-list.
print expr-list >file
Print expressions on file.
printf fmt, expr-list
Format and print.
printf fmt, expr-list >file
Format and print into file.
system(cmd-line)
Execute the command cmd-line, and return its exit status.
Other input/output redirections are also allowed. For print and printf,
>>file appends output onto file, whereas a `|' command writes onto a
pipe. Likewise, command |getline pipes into getline. getline returns zero
when it reads EOF, and -1 if an error occurs.
<i>The printf Statement
The AWK statement printf and the function sprintf() (see below) accept the
following conversion specification formats:
%c An ASCII character. If the argument used for %c is numeric, it is
treated as a character and printed. Otherwise, the argument is
assumed to be a string, and the only first character of that string is
printed.
%d A decimal number (the integer part).
%i Just like %d.
%e A floating-point number of the form [-]d.ddddddE[+-]dd.
%f A floating-point number of the form [-]ddd.dddddd.
%g Use `e' or `f' conversion, whichever is shorter, with nonsignificant
zeros suppressed.
%o An unsigned octal number (again, an integer).
%s A character string.
%x An unsigned hexadecimal number (an integer).
%X Like %x, but using ``ABCDEF'' instead of ``abcdef''.
%% A single `%' character; no argument is converted.
There are optional, additional parameters that may lie between the `%' and
the control letter:
- The expression should be left-justified within its field.
width
The field should be padded to this width. If the number has a leading
zero, then the field will be padded with zeroes; otherwise, it is
padded with blanks.
.prec
A number that indicates the maximum width of the string or digit to
the right of the decimal point.
The dynamic width and precision capabilities of the ANSI C printf()
routines are supported. A `*' in place of either the width or precision
specification causes AWK to take its value from the argument list to printf
or sprintf().
Special File Names
When doing I/O redirection from either print or printf into a file, or via
getline from a file, gawk recognizes certain special file names internally.
These file names allow access to open file descriptors inherited from
gawk's parent process (usually the shell). Other special file names
provide access information about the running gawk process. The file names
are as follows:
/dev/pid
Reading this file returns the identfier of the current process, in
decimal, terminated with a newline.
/dev/ppid
Reading this file returns the identifier of the current's process's
parent, in decimal, terminated with a newline.
/dev/pgrpid
Reading this file returns the current process's group identifier, in
decimal, terminated with a newline.
/dev/user
Reading this file returns a single record terminated with a newline.
The fields are separated with blanks. $1 is the value of the system
call getuid(); $2 is the value of the system call geteuid() ; $3 is
the value of the system call getgid(); and $4 is the value of the
system call getegid(). If there are any additional fields, they are
the group identifiers returned by getgroups().
/dev/stdin
The standard input.
/dev/stdout
The standard output.
/dev/stderr
The standard error output.
/dev/fd/n
The file associated with the open-file descriptor n.
These are particularly useful for error messages. For example, these files
let you use the statement
print "You blew it!" > "/dev/stderr"
where otherwise you would have had to say:
print "You blew it!" | "cat 1>&2"
These file names may also be used on the command line to name data files.
Numeric Functions
AWK contains the following pre-defined arithmetic functions:
atan2(y, x)
Return the arctangent of y/x, in radians.
cos(expr)
Returns the cosine, in radians.
exp(expr)
The exponential function.
int(expr)
Truncate to integer.
log(expr)
The natural-logarithm function.
rand()
Returns a random number between zero and one.
sin(expr)
Return the sine in radians.
sqrt(expr)
The square-root function.
srand(expr)
Use expr as a new seed for the random number generator. If no expr is
provided, the time of day will be used. The return value is the
previous seed for the random number generator.
String Functions
AWK has the following pre-defined string functions:
gsub(r, s, t)
For each substring matching the regular expression r in the string t,
substitute the string s and return the number of substitutions. If t
is not supplied, use $0.
index(s, t)
Return the index of the string t in the string s, or zero if t is not
present.
length(s)
Return the length of the string s, or the length of $0 if s is not
supplied.
match(s, r)
Return the position in s where the regular expression r occurs, or
zero if r is not present, and set the values of RSTART and RLENGTH.
split(s, a, r)
Split the string s into the array a on the regular expression r, and
return the number of fields. If r is omitted, use FS instead.
sprintf(fmt, expr-list)
Print expr-list according to fmt, and return the resulting string.
sub(r, s, t)
Just like gsub(), but only the first matching substring is replaced.
substr(s, i, n)
Return the n-character substring of s starting at i. If n is omitted,
the rest of s is used.
tolower(str)
Return a copy of the string str, with all the upper-case characters in
str translated to their corresponding lower-case counterparts. Non-
alphabetic characters are left unchanged.
toupper(str)
Return a copy of the string str, with all the lower-case characters in
str translated to their corresponding upper-case counterparts. Non-
alphabetic characters are left unchanged.
Time Functions
Because one of the primary uses of AWK programs is processing log files
that contain time stamp information, gawk provides the following two
functions for obtaining time stamps and formatting them.
systime()
Return the current time of day as the number of seconds since 00:00:00
hours on January 1, 1970 GMT.
strftime(format, timestamp)
Format timestamp according to the specification within format.
timestamp should be of the same form as returned by systime(). If
timestamp is missing, the current time of day is used. See the
Lexicon entry for strftime() for the format conversions that are
guaranteed to be available.
String Constants
String constants in AWK are sequences of characters enclosed between
quotation marks `"'. Within a string, the following escape sequences are
recognized:
\\ Literal backslash
\a The BEL character
\b Backspace
\f Form-feed
\n New line
\r Carriage return
\t Horizontal tab
\v vertical tab.
\xXX Character with hexadecimal value XX
\OOO Character represented by octal digits OOO
\c The literal character c
The escape sequences may also be used within constant regular expressions
(e.g., /[\t\f\n\r\v]/ matches whitespace characters).
Functions
AWK defines a function as follows:
function name(parameter list) { statements }
AWK executes a function when it is called from within the action part of a
regular pattern/action statement. The parameters supplied in the function
call are used to instantiate the formal parameters declared within the
function. Arrays are passed by reference, other variables are passed by
value.
Because functions were not originally part of the AWK language, the
provision for local variables is rather clumsy: they are declared as extra
parameters in the parameter list. The convention is to separate local
variables from real parameters by extra spaces in the parameter list. For
example:
function f(p, q, a, b) { # a & b are local
.....
}
/abc/ { ...
; f(1, 2) ; ...
}
The left parenthesis in a function call is required to immediately follow
the function name, without any intervening white space. This is to avoid a
syntactic ambiguity with the concatenation operator. This restriction does
not apply to the built-in functions listed above.
Functions may call each other and may be recursive. Function parameters
used as local variables are initialized to the null string and the number
zero upon function invocation.
The word func may be used in place of function.
Examples
Print and sort the login names of every user on your system:
BEGIN { FS = ":" }
{ print $1 | "sort" }
Count lines in a file:
{ nlines++ }
END { print nlines }
Precede each line by its number in the file:
{ print FNR, $0 }
Concatenate and line number (a variation on a theme):
{ print NR, $0 }
Compatibility
A primary goal for gawk is compatibility with the POSIX Standard, as well
as with the latest version of UNIX awk. To this end, gawk incorporates the
following user-visible features that are not described in the AWK book, but
are part of awk in System V Release 4, and are in the POSIX Standard:
-> The option -v for assigning variables before program execution starts is
new. The book indicates that command line variable assignment happens
when awk would otherwise open the argument as a file, which is after the
BEGIN block is executed. However, in earlier implementations, when such
an assignment appeared before any file names, the assignment would
happen before the BEGIN block was run. Applications came to depend on
this ``feature.'' When awk was changed to match its documentation, this
option was added to accomodate applications that depended upon the old
behavior. (This feature was agreed upon by both the AT&T and GNU
developers.)
-> The option -W for implementation specific features is from the POSIX
Standard.
-> When processing arguments, gawk uses the special option ``--'' to signal
the end of arguments, and warns about, but otherwise ignores, undefined
options.
-> The AWK book does not define the return value of srand(). The System V
Release 4 version of UNIX awk (and the POSIX Standard standard) has it
return the seed it was using, to allow keeping track of random number
sequences. Therefore, srand() in gawk also returns its current seed.
-> Other new features include the following: use of multiple -f options
(from MKS awk); the ENVIRON array; the escape sequences \a and \v (done
originally in gawk and fed back into AT&T's); the built-in functions
tolower() and toupper() (from AT&T); and the ANSI-C conversion
specifications in printf (done first in AT&T's version).
GNU Extensions
gawk has some extensions to POSIX Standard awk. They are described in this
section. All the extensions described here can be disabled by invoking
gawk with the command-line option -W compat. The following features of gawk
are not available in POSIX Standard awk:
-> The escape sequence \x.
-> The functions systime() and strftime().
-> The special-file names available for I/O redirection.
-> The variables ARGIND and ERRNO are not special.
-> The variable IGNORECASE and its side-effects are not available.
-> The variable FIELDWIDTHS and fixed-width field splitting.
-> No path search is performed for files named via the option -f.
Therefore, the environmental variable AWKPATH is not special.
-> The use of next file to abandon processing of the current input file.
The AWK book does not define the return value of the function close().
gawk's close() returns the value from fclose() or pclose() when closing a
file or pipe, respectively. When gawk is invoked with the option -W
compat, if the fs argument to option -F is `t', then FS will be set to the
tab character. Because this is a rather ugly special case, it is not the
default behavior. This behavior also does not occur if -Wposix has been
specified.
Historical Features
There are two features of historical AWK implementations that gawk
supports. First, it is possible to call the length() built-in function not
only with no argument, but even without parentheses! Thus
a = length
is the same as either of
a = length()
a = length($0)
This feature is marked as ``deprecated'' in the POSIX Standard standard,
and gawk will issue a warning about its use if option -Wlint is specified
on the command line.
The other feature is the use of the continue statement outside the body of
a while, for, or do loop. Traditional AWK implementations have treated
such usage as equivalent to the next statement. gawk supports this usage
if -Wposix has not been specified.
See Also
awk,
commands,
Programming COHERENT
Introduction to the awk Language, tutorial.
Aho, Alfred V.; Kernighan, Brian W.; Weinberger, Peter J.: The AWK
Programming Language. Englewood Cliffs, NJ, Addison-Wesley, Inc., 1988
(ISBN 0-201-07981-X).
The GAWK Manual, ed 0.15. Boston, The Free Software Foundation, 1993.
Notes
The option -F option is not necessary given the command line variable
assignment feature; it remains only for backwards compatibility.
If your system actually has support for /dev/fd and the associated
/dev/stdin, /dev/stdout, and /dev/stderr files, you may get different
output from gawk than you would get on a system without those files. When
gawk interprets these files internally, it synchronizes output to the
standard output with output to /dev/stdout, while on a system with those
files, the output is actually to different open files. Caveat utilitor.
This man page documents gawk, version 2.15. Please note that with this
version, gawk no longer recognizes the command-line options -c, -V, -C, -a,
and -e that had been recognized by version 2.11.
The original version of UNIX awk was designed and implemented by Alfred
Aho, Peter Weinberger, and Brian Kernighan of AT&T Bell Laboratories.
Brian Kernighan continues to maintain and enhance it.
Paul Rubin and Jay Fenlason, of the Free Software Foundation, wrote gawk to
be compatible with the original version of awk distributed in UNIX version
7. John Woods contributed a number of bug fixes. David Trueman, with
contributions from Arnold Robbins, made gawk compatible with the new
version of UNIX awk.
Brian Kernighan of AT&T Bell Laboratories provided valuable assistance
during testing and debugging. The authors thank him.
Finally, please note that gawk and its associated documentation (including
this manual page) is protected by the Free Software Foundation's
``copyleft''. For details on your rights and obligations, see the file
COPYING in the source code for gawk, which is available through the Mark
Williams BBS and other public-domain systems.