Misty Programming Language:

The Preprocessor

Misty programs enjoy a pass through a preprocessor before being compiled. The preprocessor provides support for literate programming: code fragments, macros, and include files.

Directives

Preprocessor directives start with a % (percent sign). They can appear anywhere except inside of text constants. The set token excludes all of the preprocessor directives but includes names, numbers, strings, comments, and punctuators. The % (percent sign) is a punctuator when it is not immediately preceding a name. Directives divide a source text into preprocessor blocks.

Blocks

preprocessor-program => preprocessor-program-block preprocessor-directive-block

preprocessor-program-block => preprocessor-program-block-content*

preprocessor-macro-block => preprocessor-macro-block-content*

preprocessor-directive-block => preprocessor-directive-block-content*

preprocessor-program-block-content
=> preprocessor-fragment-expansion
=> preprocessor-macro-expansion
=> preprocessor-program-conditional
=> token

preprocessor-macro-block-content
=> preprocessor-fragment-expansion
=> preprocessor-macro-expansion
=> preprocessor-macro-conditional
=> token

preprocessor-directive-block-content
=> preprocessor-close
=> preprocessor-comment
=> preprocessor-fragment
=> preprocessor-include
=> preprocessor-macro
=> preprocessor-conditional

There are three kinds of blocks: program blocks, macro blocks, and directive blocks. The blocks are defined by and separated by preprocessor directives.

Conditional

preprocessor-program-conditional => '%if' preprocessor-condition '%then' preprocessor-program-block ('%elif' preprocessor-condition '%then' preprocessor-program-block)* ['%else' preprocessor-program-block] '%fi'

preprocessor-conditional => '%if' preprocessor-condition '%then' preprocessor-directive-block ('%elif' preprocessor-condition '%then' preprocessor-directive-block)* ['%else' preprocessor-directive-blockonload was ] '%fi'

preprocessor-macro-conditional => '%if' preprocessor-macro-condition '%then' preprocessor-block ('%elif' preprocessor-condition '%then' preprocessor-block)* ['%else' preprocessor-block] '%fi'

preprocessor-condition => name 'is' ['not'] (text | 'defined')

preprocessor-macro-condition => name 'is' ['not'] (text | 'defined' | 'number' | 'text' | 'name' | 'special')

Conditionals can be used to adapt a program to various environments or configurations. Regions of the program will be included or excluded as determined by the existence of or value of simple macros and macro parameters. Simple macros can be provided by the local environment or by a build system or by the program itself.

Directive conditionals are processed before expansion, and can control whether or not particular %include, %macro, and %fragment are processed.

Programs, fragments and macros can contain conditionals which are processed during expansion. Macros and fragments have a larger set of conditions available, including tests incorporating macro parameters. name can be a simple macro name or a macro parameter name.

Block directives

A block directive starts a block which ends with the next block directive or with the %program directive. A block directive cannot contain within its block another block directive.

A block (even a comment block) contains a sequence of tokens. Therefore, the block must contain a sequence of names, texts, punctuators, numbers, and comments. Badly formed tokens will cause a syntax error.

%comment

preprocessor-comment => preprocessor-comment-tag token*

preprocessor-comment-tag => '%comment' | '%book' | '%volume' |'%chapter' | '%article' | '%section' | '%subsection' | '%note' | '%specimen' | '%continue' | '%doc'

The %comment directive causes all of the following tokens to be ignored until the next block directive or block terminator. There are other directives that act the same as %comment (%chapter, %section, %subsection, etc.) in the preprocessor, but which produce documentation in the literate processor. Note that preprocessor tokens cannot appear in a comment unless they are contained within a quoted string.

%macro

preprocessor-macro => '%macro' name preprocessor-macro-parameter* ':' preprocessor-block*

preprocessor-macro-parameter
=> name
=> '(' preprocessor-macro-parameter-list ')' ':' preprocessor-block*
=> '{' preprocessor-macro-parameter-list '}' ':' preprocessor-block*
=> '[' preprocessor-macro-parameter-list ']' ':' preprocessor-block*

preprocessor-macro-parameter-list => name,

The %macro directive defines a macro.

(Unlike the C #define directive, the sequence of tokens is not limited to a single line. Macro arguments can contain a large number of tokens spanning many lines.)

The sequence of tokens is not expanded at definition time.

When the macro name is encountered later, it will be replaced with its sequence of tokens.

%fragment

preprocessor-fragment => '%fragment' text (':' | '+:') preprocessor-block*

The %fragment directive causes all of the following tokens to be associated with the text. Fragments differ from macros in some important ways:

Fragments can be inserted by use of the % text operator. Fragment insertion can take place before fragments are defined. This allows for out-of-order exposition. Every fragment must be inserted exactly once.

Token directives

preprocessor-text => ('%name' | '%number' | '%text' | '%special') '(' (name | text)* ')'

Token directives are used to construct tokens. The parameter is one or more simple macros or string literals or number literals. They will all be concatenated together to form a token.

%text

The %text directive makes a text token. If it has no parameters, it is the empty text. Do not confuse this with % text.

%name

The %name directive makes a name token. It must conform to the rules for names. If it has no parameters, it is an error.

%number

The %number directive makes a number token. It must conform to the rules for numbers. If it has no parameters, it is an error.

%special

The %special directive makes a punctuator token. It must conform to the rules for punctuators. If it has no parameters, it is an error.

Block closing directive

There are three ways of closing a directive block. One is to start another block directive, since a block cannot contain another block. The second is to run off the end of the input. The third is to use the %program directive. The tokens following the %program tag are interpreted as a program which may expand fragments.

There is an implied %program inserted before the beginning. This is so that programs that make no use of the preprocessor features do not need to explicitly say so. It also makes it possible to have conventional comments before the directives.

%program

preprocessor-close => preprocessor-closer preprocessor-program-block

preprocessor-closer
=> '%end'
=> '%program'

The %program directive closes a block directive.

Inclusion

%include

preprocessor-include => '%include' text

The %include directive replaces itself with the tokens associated with the text. The text has previously been registered, possibly with a file system or database.

Fragment Expansion

Every fragment must be expanded exactly once.

preprocessor-fragment-expansion => '%' string

% text

The sequence % followed by a text is replaced with the fragment associated with the text. A fragment can be expanded before it is defined or fully appended.

Macro Expansion

There are two forms of macro expansion: simple and regular.

preprocessor-fragment-expansion => name proprocessor-macro-argument*

preprocessor-macro-argument
=> name
=> '(' preprocessor-macro-argument-list ')'
=> '{' preprocessor-macro-argument-list '}'
=> '[' preprocessor-macro-argument-list ']'

Unlike fragments, a macro can be expanded any number of times, including not at all.

Simple

A simple macro expansion is indicated simply by the name of a simple macro.

Regular

preprocessor-macro-argument-list => preprocessor-argument,

preprocessor-argument => (token_except_,_(_)_{_}_[_] | preprocessor-argument-filler)*

preprocessor-argument-filler
=> '(' preprocessor-argument ')'
=> '{' preprocessor-argument '}'
=> '[' preprocessor-argument ']'

A regular macro expansion is indicated by a name followed by zero or more arguments in parens, braces, or brackets, matching the macro definition. The number of arguments must exactly match the number of parameters in the macro definition.

The arguments are separated by commas. Within macro arguments, the tokens ( ) [ ] { } must balance. Commas within balanced parens and brackets are not used to separate the arguments. This makes it possible to include programming statements in macro arguments.

A single comma can be passed as a macro argument as %special(','). A single left paren can be passed as a macro argument as %special('(').

A macro expansion must have exactly the right number of arguments.

Macros cannot be expanded recursively.

How it works

First the source text is tokenized with text.tokens(), producing an array of tokens. The array.macro() method then produces a new array of tokens. It works in two phases.

Phase 1

In phase 1, # comments are deleted, %if directive regions are resolved, includes are inserted, and macros and fragments are defined.

If an %include directive is found, then obtain the text, tokenize it, and insert the tokens at this point.

If a comment-type directive is found, then delete all tokens until finding another block directive or a block closing directive.

If a %macro or %fragment directive is found, then accumulate tokens under that name or token until finding another block directive or block closing directive. No expansion occurs.

If a block closing directive is found, delete it.

Phase 2

In phase 2, macros and fragments are expanded.

If an %text is found, note that and replace the %text with the tokens of the fragment. Phase 2 continues with the first token of the fragment.

If a %name, %text, or %number expression is found, the expression is replaced by the new token.

If a name is found that matches the name of a macro, expand it.

There will be an error raised if the final array is empty, or if there is a fragment that was not used, or if a fragment is expanded twice.