Go to the first, previous, next, last section, table of contents.


Input control

This chapter describes various builtin macros for controlling the input to m4.

Deleting whitespace in input

The builtin dnl reads and discards all characters, up to and including the first newline:

dnl

and it is often used in connection with define, to remove the newline that follow the call to define. Thus

define(`foo', `Macro `foo'.')dnl A very simple macro, indeed.
foo
=>Macro foo.

The input up to and including the next newline is discarded, as opposed to the way comments are treated (see section Comments).

Usually, dnl is immediately followed by an end of line or some other whitespace. GNU m4 will produce a warning diagnostic if dnl is followed by an open parenthesis. In this case, dnl will collect and process all arguments, looking for a matching close parenthesis. All predictable side effects resulting from this collection will take place. dnl will return no output. The input following the matching close parenthesis up to and including the next newline, on whatever line containing it, will still be discarded.

Changing the quote characters

The default quote delimiters can be changed with the builtin changequote:

changequote(opt start, opt end)

where start is the new start-quote delimiter and end is the new end-quote delimiter. If any of the arguments are missing, the default quotes (` and ') are used instead of the void arguments.

The expansion of changequote is void.

changequote([, ])
=>
define([foo], [Macro [foo].])
=>
foo
=>Macro foo.

If no single character is appropriate, start and end can be of any length.

changequote([[, ]])
=>
define([[foo]], [[Macro [[[foo]]].]])
=>
foo
=>Macro [foo].

Changing the quotes to the empty strings will effectively disable the quoting mechanism, leaving no way to quote text.

define(`foo', `Macro `FOO'.')
=>
changequote(, )
=>
foo
=>Macro `FOO'.
`foo'
=>`Macro `FOO'.'

There is no way in m4 to quote a string containing an unmatched left quote, except using changequote to change the current quotes.

If the quotes should be changed from, say, `[' to `[[', temporary quote characters have to be defined. To achieve this, two calls of changequote must be made, one for the temporary quotes and one for the new quotes.

Neither quote string should start with a letter or `_' (underscore), as they will be confused with names in the input. Doing so disables the quoting mechanism.

Changing comment delimiters

The default comment delimiters can be changed with the builtin macro changecom:

changecom(opt start, opt end)

where start is the new start-comment delimiter and end is the new end-comment delimiter. If any of the arguments are void, the default comment delimiters (# and newline) are used instead of the void arguments. The comment delimiters can be of any length.

The expansion of changecom is void.

define(`comment', `COMMENT')
=>
# A normal comment
=># A normal comment
changecom(`/*', `*/')
=>
# Not a comment anymore
=># Not a COMMENT anymore
But: /* this is a comment now */ while this is not a comment
=>But: /* this is a comment now */ while this is not a COMMENT

Note how comments are copied to the output, much as if they were quoted strings. If you want the text inside a comment expanded, quote the start comment delimiter.

Calling changecom without any arguments disables the commenting mechanism completely.

define(`comment', `COMMENT')
=>
changecom
=>
# Not a comment anymore
=># Not a COMMENT anymore

Changing the lexical structure of the input

The macro changesyntax and all associated functionality is experimental (see section Experimental features in GNU m4). The functionality might change in the future. Please direct your comments about it the same way you would do for bugs.

The input to m4 is read character per character, and these characters are grouped together to form input tokens (such as macro names, strings, comments, etc.).

Each token is parsed according to certain rules. For example, a macro name starts with a letter or _ and consists of the longest possible string of letters, _ and digits. But who is to decide what characters are letters, digits, quotes, white space? Earlier the operating system decided, now you do.

Input characters belong to different categories:

Letters
Characters that start a macro name. The default is the letters as defined by the operating system and the character _.
Digits
Characters that, together with the letters, form the remainder of a macro name. The default is the ten digits 0...9.
White space
Characters that should be trimmed from the beginning of each argument to a macro call. The default is SPC, TAB, newline and possibly others as defined by the operating system.
Open parenthesis
Characters that open the argument list of a macro call. Default is (.
Close parenthesis
Characters that close the argument list of a macro call. Default is ).
Argument separator
Characters that separate the arguments of a macro call. Default is ,.
Other
Characters that have no special syntactical meaning to m4. Default is all characters expect those in the categories above.
Active
Characters that themselves, alone, form macro names. No default.
Escape
Characters that must precede macro names for them to be recognised. No default.

Each character can, besides the basic syntax category, have some syntax attributes. These are:

Left quote
The characters that start a quoted string. Default is `. Basic syntax category is `Other'.
Right quote
The characters that end a quoted string. Default is '. Basic syntax category is `Other'.
Begin comment
The characters that begin a comment. Default is #. Basic syntax category is `Other'.
End comment
The characters that end a comment. Default is newline. Basic syntax category is `White space'.

The builtin macro changesyntax is used to change the way m4 parses the input stream into tokens.

changesyntax(syntax-spec, ...)

The syntax-spec is a string, whose first characters determines the syntax category of the other characters. Character ranges are expanded as for section Translating characters. If there are no other characters, all characters are given the syntax code.

The characters for the syntax categories are:

W
Letters
D
Digits
S
White space
(
Open parenthesis
)
Close parenthesis
,
Argument separator
O
Other
@
Escape
A
Active
L
Left quote
R
Right quote
B
Begin comment
E
End comment

With changesyntax we can modify the meaning of a word.

define(`test.1', `TEST ONE')
=>
__file__
=>in
changesyntax(`O_', `W.')
=>
__file__
=>__file__
test.1
=>TEST ONE

Another possibility is to change the syntax of a macro call.

define(`test', `$#')
=>
test(a, b, c)
=>3
changesyntax(`(<', `,|', `)>', `O(,)')
=>
test(a, b, c)
=>0(a, b, c)
test<a|b|c>
=>3

Leading spaces are always removed from macro arguments in m4, but by changing the syntax categories we can avoid it.

define(`test', `$1$2$3')
=>
test(a, b, c)
=>abc
changesyntax(`O         ')
=>
test(a, b, c)
=>a b c

It is not yet possible to redefine the `$' used to indicate macro arguments in user defined macros.

Macro calls can be given a TeX or Texinfo like syntax using an escape. If one or more characters are defined as escapes macro names are only recognised if preceded by an escape character.

If the escape is not followed by what is normally a word (a letter optionally followed by letters and/or numerals), that single character is returned as a macro name.

As always, words without a macro definition cause no error message. They and the escape character are simply output.

define(`foo', `bar')
=>
changesyntax(`@@')
=>
foo
=>foo
@foo
=>bar
@changesyntax(`@\', `O@')
=>
foo
=>foo
@foo
=>@foo
\foo
=>bar
define(`#', `No comment')
=>define(#, No comment)
\define(`#', `No comment')
=>
\# \foo # Comment \foo
=>No comment bar # Comment \foo

Active characters are known from TeX. In m4 an active character is always seen as a one-letter word, and so, if it has a macro definition, the macro will be called.

define(`@', `TEST')
=>
@
=>@
changesyntax(`A@')
=>
@
=>TEST

There is obviously an overlap with changecom and changequote. Comment delimiters and quotes can now be defined in two different ways. To avoid incompatibilites, if the quotes are set with changequote, all characters marked in the syntax table as quotes will be unmarked, leaving only one set of defined quotes as before. Since the quotes are syntax attributes rather than syntax categories, the old quotes simply revert to their old category. If the quotes are set with changesyntax, other characters marked as quotes are left untouched, resulting in at least two sets of quotes. This applies to comment delimiters as well, mutatis mutandis.

define(`test', `TEST')
=>
changesyntax(`L<', `R>')
=>
<test>
=>test
`test>
=>test
changequote(<[>, `]')
=>
<test>
=><TEST>
[test]
=>test

If categories, that form single character tokens, contain several characters, all are treated as equal. Any open parenthesis will match any close parenthesis, etc.

changesyntax(`({<', `)}>', `,;:', `O(,)')
=>
eval{2**4-1; 2 : 8>
=>00001111

This is not so for long quotes, which cannot be matched by single character quote and vice versa. The same goes for comment delimiters.

define(`test', `==$1==')
=>
changequote(`<<', `>>')
=>
changesyntax(<<L[>>, <<R]>>)
=>
test(<<testing]>>)
=>==testing]==
test([testing>>])
=>==testing>>==
test([<<testing>>])
=>==testing==

Note how it is possible to have both long and short quotes, if changequote is used before changesyntax.

The syntax table is initialiased to be backwards compatible, so if you never call changesyntax, nothing will have changed.

Debugging output continue to use (, , and ) to show macro calls.

The builtin macros changesyntax is recognized only when given arguments.

Changing the lexical structure of words

The macro changeword and all associated functionality is experimental (see section Experimental features in GNU m4). It is only available if the --enable-changeword option was given to configure, at GNU m4 installation time. The functionality might change or even go away in the future. Do not rely on it. Please direct your comments about it the same way you would do for bugs.

A file being processed by m4 is split into quoted strings, words (potential macro names) and simple tokens (any other single character). Initially a word is defined by the following regular expression:

[_a-zA-Z][_a-zA-Z0-9]*

Using changeword, you can change this regular expression. Relaxing m4's lexical rules might be useful (for example) if you wanted to apply translations to a file of numbers:

changeword(`[_a-zA-Z0-9]+')
define(1, 0)
1
=>
0
=>0

The syntax for regular expressions is the same as in GNU Emacs. See section `Syntax of Regular Expressions' in The GNU Emacs Manual.

Tightening the lexical rules is less useful, because it will generally make some of the builtins unavailable. You could use it to prevent accidental call of builtins, for example:

define(`_indir', defn(`indir'))
changeword(`_[_a-zA-Z0-9]*')
esyscmd(foo)
_indir(`esyscmd', `ls')

Because m4 constructs its words a character at a time, there is a restriction on the regular expressions that may be passed to changeword. This is that if your regular expression accepts `foo', it must also accept `f' and `fo'.

changeword has another function. If the regular expression supplied contains any subexpressions in parentheses, then text outside the first of these is discarded before symbol lookup. So:

changecom(`/*', `*/')
changeword(`#\([_a-zA-Z0-9]*\)')
#esyscmd(ls)

m4 now requires a `#' mark at the beginning of every macro invocation, so one can use m4 to preprocess shell scripts without getting shift commands swallowed, and plain text without losing various common words.

m4's macro substitution is based on text, while TeX's is based on tokens. changeword can throw this difference into relief. For example, here is the same idea represented in TeX and m4. First, the TeX version:

\def\a{\message{Hello}}
\catcode`\@=0
\catcode`\\=12
=>@a
=>@bye

Then, the m4 version:

define(a, `errprint(`Hello')')
changeword(`@\([_a-zA-Z0-9]*\)')
=>@a

In the TeX example, the first line defines a macro a to print the message `Hello'. The second line defines @ to be usable instead of \ as an escape character. The third line defines \ to be a normal printing character, not an escape. The fourth line invokes the macro a. So, when TeX is run on this file, it displays the message `Hello'.

When the m4 example is passed through m4, it outputs `errprint(Hello)'. The reason for this is that TeX does lexical analysis of macro definition when the macro is defined. m4 just stores the text, postponing the lexical analysis until the macro is used.

You should note that using changeword will slow m4 down by a factor of about seven.

Saving input

It is possible to `save' some text until the end of the normal input has been seen. Text can be saved, to be read again by m4 when the normal input has been exhausted. This feature is normally used to initiate cleanup actions before normal exit, e.g., deleting temporary files.

To save input text, use the builtin m4wrap:

m4wrap(string, ...)

which stores string and the rest of the arguments in a safe place, to be reread when end of input is reached.

define(`cleanup', `This is the `cleanup' actions.
')
=>
m4wrap(`cleanup')
=>
This is the first and last normal input line.
=>This is the first and last normal input line.
^D
=>This is the cleanup actions.

The saved input is only reread when the end of normal input is seen, and not if m4exit is used to exit m4.

It is safe to call m4wrap from saved text, but then the order in which the saved text is reread is undefined. If m4wrap is not used recursively, the saved pieces of text are reread in the opposite order in which they were saved (LIFO--last in, first out).


Go to the first, previous, next, last section, table of contents.