As m4 reads its input, it separates it into tokens. A
token is either a name, a quoted string, or any single character, that
is not a part of either a name or a string. Input to m4 can also
contain comments.
GNU m4 passes all ISO-8859-1 characters, except '\0'. Eight
bit ISO-8859-1 characters can be used as quotes, comment delimiters and
in macro names, depending on the active locale.
A name is a sequence of letters, digits, and the character _
(underscore), where the first character is not a digit. m4 will
use the longest such sequences found in the input. If a name has a
macro definition, it will be subject to macro expansion
(see section How to invoke macros).)
Examples of legal names are: `foo', `_tmp', and `name01'.
Names are case-sensitive.
The definitions of letters, digits and other input characters can be
changed at any time, using the builtin macro changesyntax.
See section Changing the lexical structure of the input, for more information.
A quoted string is a sequence of characters surrounded by the quotes ` and ', where the number of start and end quotes within the string balances. The value of a string token is the text, with one level of quotes stripped off. Thus
`'
is the empty string, and
``quoted''
is the string
`quoted'
The quote characters can be changed at any time, using the builtin macro
changequote. See section Changing the quote characters, for more information.
Any character, that is neither a part of a name, nor of a quoted string, is a token by itself.
Comments in m4 are normally delimited by the characters `#'
and newline. All characters between the comment delimiters are ignored,
but the entire comment (including the delimiters) is passed through to
the output--comments are not discarded by m4.
Comments cannot be nested, so the first newline after a `#' ends the comment. The commenting effect of the begin comment character can be inhibited by quoting it.
The comment delimiters can be changed to any string at any time, using
the builtin macro changecom. See section Changing comment delimiters, for more
information.
As m4 reads the input token by token, it will copy each token
directly to the output immediately.
The exception is when it finds a word with a macro definition. In that
case m4 will calculate the macro's expansion, possibly reading
more input to get the arguments. It then insert the expansion in front
of the remaining input. In other words, the resulting text from a macro
call will be read and parsed into tokens again.
m4 expands a macro as soon as possible. It it finds a macro call
when collecting the arguments to another, it will expand the second
call first. If the input is
format(`Result is %d', eval(2**15))
m4 will first expand `eval(2**15)' to `32768', and only
then expand the resulting call
format(`Result is %d', 32768)
which will give the output
Result is 32768
The order in which m4 expands the macros can be explored using
the section Tracing macro calls facilities of GNU m4.
This process continues until there are no more macro calls to expand and all the input has been consumed.
Go to the first, previous, next, last section, table of contents.