Misty Programming Language:

The Preprocessor

Misty programs enjoy a pass through a preprocessor before being compiled. The preprocessor provides support for literate programming: code fragments, macros, and include files.

Directives

misty-program => (comment | preprocessor-outer-conditional | preprocessor-content)*

preprocessor-content
=> token
=> preprocessor-comment
=> preprocessor-macro
=> preprocessor-fragment
=> preprocessor-token
=> preprocessor-close

Preprocessor directives start with a % (percent sign). They can appear anywhere except inside of text constants.

Conditional

preprocessor-outer-conditional => '%if' preprocessor-condition '%then' misty-program ('%elif' preprocessor-condition '%then' misty-program)* ['%else' misty-program] '%fi'

preprocessor-condition => name 'is' ['not'] (text | 'defined')

Conditionals can be used to adapt a program to various environments or configurations. Regions of the program will be included or excluded as determined by the existence or value of simple macros. Simple macros can be provided by the local environment or by a build system.

Block directives

A block directive starts a block which ends with the next block directive or with the %program directive. A block directive cannot contain within its block another block directive.

A block (even a comment block) contains a sequence of tokens. Therefore, the block must contain a sequence of names, texts, operators, numbers, and comments. Badly formed tokens will cause a syntax error.

%comment

preprocessor-comment => preprocessor-comment-tag token*

preprocessor-comment-tag => '%comment' | '%volume' | '%chapter' | '%section' | '%subsection' | '%continue' | '%note' | '%specimen' | '%doc'

The %comment directive causes all of the following tokens to be ignored until the next block directive or block terminator. There are other directives that act the same as %comment (%chapter, %section, %subsection, etc.) in the preprocessor, but which produce documentation in the literate processor.

%macro

preprocessor-macro => '%macro' name [ '('[name (',' name)*]')'] ':=' token*

The %macro directive defines a macro. There are three kinds of macros. Simple macros have no parameters. Regular macros and grande macros have a block of zero or more parameters wrapped in parens. A macro binds a sequence of tokens to a name. The sequence of tokens ends with the next preprocessor directive.

(Unlike the C #define directive, the sequence of tokens is not limited to a single line. Also, there is a single name space for all macros; there is not a separate namespace for simple macros vs regular macros.)

The sequence of tokens is not expanded at definition time.

When the macro name is encountered later, it will be replaced with its sequence of tokens.

%fragment

preprocessor-fragment => '%fragment' text ':=' token*

The %fragment directive causes all of the following tokens to be associated with the text. Fragments differ from macros in some important ways:

Fragments can be inserted by use of the % text operator. Fragment insertion can take place before fragments are defined. This allows for out-of-order exposition. Every fragment must be inserted exactly once.

Token directives

preprocessor-text => ('%name' | '%number' | '%text') '(' (name | text)* ')'

Token directives are used to construct tokens. The parameter is one or more simple macros or ???? they will all be concatenated together to form a token.

%text

The %text directive makes a text token. If it has no parameters, it is the empty text. Do not confuse this with % text.

%name

The %name directive makes a name token. It must conform to the rules for names. If it has no parameters, it is an error.

%number

The %number directive makes a number token. It must conform to the rules for numbers. If it has no parameters, it is an error.

Block closing directive

There are three ways of closing a block directive. One is to start another block directive, since a block cannot contain another block. The second is to run off the end of the input. The third is to use the %program directive. The tokens following the %program tag are interpreted as a Misty program.

%misty

preprocessor-close => '%program'

The %program directive closes a block directive.

Inclusion

%include

preprocessor-comment => '%include' (text | name)

The %include directive replaces itself with the tokens associated with the text. The text has previously been registered, possibly with a file system. The file will be included only once. Attempts to include the file again in the same instance of the preprocessor will be ignored.

Fragment Expansion

Every fragment must be expanded exactly once.

% text

The sequence % followed by a text is replaced with the fragment associated with the text. A fragment can be expanded before it is defined or fully appended.

Macro Expansion

There are three forms of macro expansion: simple, regular, and grande.

Unlike fragments, a macro can be expanded any number of times, including not at all.

Simple

A simple macro expansion is indicated simply by the name of a simple macro.

Regular

preprocessor-parameter => (token_except_,_(_)_{_}_[_]_ | '(' preprocessor-parameter-filler ')' | '{' preprocessor-parameter-filler '}' | '[' preprocessor-parameter-filler ']')*

preprocessor-parameter-filler => (token_except_(_)_{_}_[_]_ | '(' preprocessor-parameter-filler ')' | '{' preprocessor-parameter-filler '}' | '[' preprocessor-parameter-filler ']')*

A regular macro expansion is indicated by a name followed by zero or more arguments in parens. The number of arguments must exactly match the number of parameters in the macro definition.

The arguments are separated by commas. Within a macro arguments, the tokens ( ) [ ] { } must balance. Commas within balanced parens and brackets are not used to separate the arguments.

Grande

The grande pattern is an alternative way of expanding a regular macro. It is useful for including long or complex material as a macro argument.

%begin(macroname, argument2, argument3...)

...stuff that will be used as argument1...

%end(macroname)

Macro arguments are separated by commas and can contain 0 or more tokens. Parens, brackets, and braces with a macro argument must balance. A comma can be in a macro argument if it is in a text or inside of at least one pair of parens, brackets, or braces. Macros embedded in a macro argument are expanded after the arguments are divided by commas.

argument1 has no restrictions on its content except that its content cannot contain the token sequence %end(macroname).

A macro must have exactly the right number of arguments.

Macros cannot be called recursively.

How it works

First the source text is tokenized with text.tokens(), producing an array of tokens. The array.macro() method then produces a new array of tokens. It works in two phases.

Phase 1

In phase 1, # comments are deleted, %if regions are resolved, includes are inserted, and macros and fragments are defined.

If an %include directive is found, then obtain the text, tokenize it, and insert the tokens at this point.

If a comment-type directive is found, then delete all tokens until finding another block directive or a block closing directive.

If a %macro or %fragment directive is found, then accumulate tokens under that name or token until finding another block directive or block closing directive. No expansion occurs.

If a block closing directive is found, delete it.

Phase 2

In phase 2, macros and fragments are expanded.

If an %text is found, note that and replace the %text with the tokens of the fragment. Phase 2 continues with the first token of the fragment.

If a %name, %text, or %number expression is found, the expression is replaced by the new token.

If a name is found that matches the name of a macro, expand it.

There will be an error raised if the final array is empty, or if there is a fragment that is not used, or if a fragment is expanded twice.