Misty Programming Language:


Regular expressions have been used with great success in text editing and casual searching tools. They are an easy way of describing and matching patterns in texts. Regular expressions have infiltrated programming languages, becoming standard features in Perl and JavaScript. The regular expression notation itself has continued to grow past the point of regularity.

One of the critical activities in secure programming is input validation. Regular expressions are poorly suited to validation because the notation is so cryptic. If the validation rules are too confusing to read, how can we have confidence that they are correct?

As an alternative, the Misty Programming Language has patterns. A pattern is a special kind of object that can be used to match patterns in texts. Patterns are less terse than regular expressions, and much easier to read and write.

A pattern literal, written as pilcrow followed by a pattern expression, produces an immutable pattern object.

An Example

We want to extract information from a url. We define an lib library record containing two patterns (ip and name) and a url pattern that uses the lib. The character introduces a pattern, much as ƒ introduces a function.

def lib: {
    ip: ¶ (4 ("octet": 1-3 digit / ".")),
    name: ¶ (1- {letter digit "_-%"})
def url: ¶ lib (
- ("scheme": name ":")
) (
"slash": 0-3 "/"
) (
"host": (ip, 1- (name / "."))
) (
- (":" "port": 1-5 digit)
) (
- ("/" "path": 0- {¬ "?#" Cc})
) (
- ("?" "query": 0- {¬ "#" Cc})
) (
- ("#" "hash": 0- char)

We can use the extract function to match a pattern against a text.

var result: extract("http://www.mistysystem.com/index.html#top", url)

json.encode(result, true) is

    hash: "top",
    host: "www.mistysystem.com",
    path: "index.html",
    scheme: "http",
    slash: "//"

We can call match again with a different text:

let result: extract("", url)

json.encode(result, true) is

    host: "",
    octet: ["216", "168", "224", "111"],
    port: "9997",
    query: "clearance=top",
    slash: ""

To contrast, this is a JavaScript regular expression that does a similar thing:


The regular expression is much more compact than the pattern, but it is also much harder to understand.

Let's take the pattern apart. First, we had a library pattern for matching a dotted IP addresses:

ip: ¶ (4 ("octet": 1-3 digit / "."))

It matches exactly four groups of one to three digits separated by .period. Each group of digits is stored in the result under the name octet. So

    # {octet: ["198", "137", "240", "92"]}

The /slash operator allows us to specify a separator. The pattern could be written without the /slash operator but the pattern would be longer.

ip: ¶ ("octet": 1-3 digit 3 ("." octet: 1-3 digit))

If we attached the name octet: on an outer level

ip: ¶ ("octet": 4 (1-3 digit / "."))

then the result would have been

    # {octet: ""}

The second pattern matches a name.

name: ¶ (1- {letter digit "_-%"})

The definition of name in this case is pretty loose: any combination of letters, digits, and the characters _underbar, -minus sign, and %percent sign.

filename: lib.name.extract("doom.dust")
    # null because a name can't contain "."

The url pattern matches urls. It matches in seven steps.

  1. It optionally matches a name followed by a :colon. If they match, the name is saved in the result record in the scheme field. A prefix of -minus sign means the same as 0-1. The item can occur once or not at all; it is optional.

    - ("scheme": name ":") 

    If it had been written as

    - "scheme": (name ":") 

    then the :colon would be saved as well as the name.

  2. It matches between 0 and 3 slash characters and saves them under the name slash.
    "slash": 0-3 "/"
  3. It matches a host name or address. This component is not optional. It is saved under the name host. First we attempt to match using the ip pattern we defined earlier. If that does not match, then we'll match a dotted domain name (one or more names separated by dots).
    "host": (ip, 1- (name / "."))
  4. It optionally matches a port number. If it is found, it is saved under the name port.
    - (":" "port": 1-5 digit)
  5. It optionally matches a path. The path can contain any sequence of characters except control characters and "?" and "#". We are reserving "?" and "#" for the last two options. If a path is found, it is saved under the name path.
    - ("/" "path": 0- {¬ "?#" Cc})
  6. It optionally matches a query text. The query text can contain any sequence of characters except control characters and "#". We are saving "#" for the final option. If a query text is found, it is saved under the name query.
    - ("?" "query": 0- {¬ "#" Cc})
  7. It optionally matches a hash (or anchor name). If it is found, it is saved under the name hash.
    - ("#" "hash": 0- char)

Patterns are made of the same tokens as the rest of the Misty Programming Language. That has benefits in development tools. Features like paren balancing are not imperiled by the mixing of language conventions. Patterns can contain comments, just as functions can.

The pattern? function can be used to sense patterns.

pattern?(url)       # true
pattern?(lib)       # false
pattern?(lib.ip)    # true
pattern?("pattern") # false
stone?(url)         # true
record?(url)        # false

Pattern expressions

pattern_literal '¶' pattern_usage pattern_group

pattern_usage indent open_pattern_usage outdent space closed_pattern_usage

open_pattern_usage expression more_open_pattern_usage

more_open_pattern_usage "" linebreak open_pattern_usage

closed_pattern_usage "" closed_pattern_use space

closed_pattern_use expression more_closed_pattern_use

more_closed_pattern_use "" ',' space closed_pattern_use

A pattern literal is the word optionally followed by one or more pattern libraries, followed by a pattern group wrapped in ( )parens. See Pattern Library below. These are the pattern elements:

Text literal

pattern_text text '@' space text

A text in quotes matches characters in the text. Normally in Misty, text comparisons are case sensitive. In patterns, text comparison is also case sensitive, except when a text is preceded by @at sign which allows lowercase letters to match uppercase letters.

"vorpal" matches "vorpal"

@ "vorpal" matches "vorpal" and "Vorpal" and "VorPal" and "VORPAL"

@ "Vorpal" matches "Vorpal" and "VorPal" and "VORPAL" but not "vorpal"

Built-in class Matches
ascii any of the 128 ASCII characters
{"\u{0}" - "\u{7F}"}
char any character
digit a digit
- "9"}
end carriage return and/or linefeed
("\n", "\r" - "\n")
hex any hexadecimal digit
{"0" - "9" "A" - "F"}
letter any Unicode letter
{Ll Lu Lt Lm Lo}
lower any lower-case Unicode letter
space any Unicode spacing character
upper any upper-case Unicode letter
LuLetter, uppercase
LlLetter, lowercase
LtLetter, titlecase
LmLetter, modifier
LoLetter, other
MnMark, nonspacing
McMark, spacing combining
MeMark, enclosing
NdNumber, decimal digit
NlNumber, letter
NoNumber, other
PcPunctuation, connector
PdPunctuation, dash
PsPunctuation, open
PePunctuation, close
PiPunctuation, initial quote
PfPunctuation, final quote
PoPunctuation, other
SmSymbol, math
ScSymbol, currency
SkSymbol, modifier
SoSymbol, other
ZsSeparator, space
ZlSeparator, line
ZpSeparator, paragraph
CcOther, control
CfOther, format
CsOther, surrogate
CoOther, private use
CnOther, not assigned
(including noncharacters)


A name can refer to one of the system-supplied names. These include ascii, char, digit, end, hex, letter, lower, space, upper as well as the two character Unicode General Categories. The Unicode General Categories are only available in character classes.

A name can refer to a user-defined pattern in a library. This allows the breaking of a complex pattern into simpler subpatterns. It also allows for the recognition of context-free languages.

A name can refer to a user-defined text in a library.

A name matches if the thing it represents matches.

Character class

pattern_class '{' pattern_set '}'

pattern_set "" name more_pattern_set pattern_text_range more_pattern_set '¬' more_pattern_set

more_pattern_set "" space pattern_set

pattern_text_range text text space '-' space pattern_text_range

A character class specification is wrapped in brackets. A set is a sequence of zero or more names or texts, and contains the union of those characters.

{letter digit}   # matches any letter or digit.


A name can refer to one of the built-in character classes, or to a library text, character class, or function.

Literal text

A text contains characters in the class. You can have one big text or lots of little texts.


If there is a -minus sign between two texts, then an inclusive range is made from the last character of the first text thru the first character of the second text..

{"0" - "9"} matches a digit, as does {"0123456789"}.


The ¬not signals characters to be excluded from the class.

{"a" - "z" ¬ "aeiou"}    # matches a lower case consonant.

{¬ "01" - "89"}          # matches any character that is not a digit.

A second ¬not signals an intersection operation on the current set.

{ascii ¬ ¬ upper}            # matches upper case ASCII letters.
{ascii ¬ "aeiou" ¬ upper}    # matches upper case ASCII consonants.


pattern_group '(' pattern_group_filling ')'

pattern_group_filling indent open_pattern_group outdent closed_pattern_group

open_pattern_group pattern_sequence more_open_pattern_group

more_open_pattern_group "" linebreak even_more_open_pattern_group

even_more_open_pattern_group open_pattern_group '/' space pattern_item

closed_pattern_group pattern_sequence more_closed_pattern_group

more_closed_pattern_group "" ',' space closed_pattern_group space '/' space pattern_item

pattern_sequence pattern_item pattern_item space pattern_sequence

pattern_item pattern_depot pattern_quantifier pattern_element pattern_quantifier pattern_depot pattern_element

pattern_element pattern_group pattern_class pattern_text name

Parens can be used to group one or more pattern expressions together. A group matches if all of the pattern expressions in it match.

Alternatives are separated by line breaks or ,comma. A set of alternatives matches if any one of the alternatives matches.

Optionally, an item after a /slash indicates a separator that appears before all but the first repetition. This is allowed only if the maximum number of repetitions is at least 2.

Quantifier Repeats
- 0 or 1 (optional)
number repeat count
from - thru in the range
from - at least


pattern_quantifier "" '-' space number_literal space number_literal '-' space number_literal '-' number_literal space

By default a pattern expression matches exactly once.


Any pattern expression can be preceded by a repetition factor. It determines the number of times that the item must match.

5 item

A range can be specified with two numbers separated by -minus. The item will match no fewer than the first number, and no more than the second number.

1-5 item

If the second number is omitted, then there is no limit to how many times the item can be matched.

1- item

An optional item is indicated by a prefix of -minus sign. This indicates that the item may match zero or one times.

- item

If the next to last item in a pattern group is /slash then the last item in the group is matched before every iteration except the first. This is a convenient way to specify a separator.

If an iteration matches emptiness, then the repetition count is satisfied, even if the minimum number of repetitions has not been reached yet.


pattern_depot "" text ':' space

Patterns can be used to extract information from texts. The extract method returns a record containing the saved matching material.

To save material, place a text and a :colon immediately before any pattern expression. All of the text that is matched by that pattern expression will be saved in the result record, using the text as the key. If multiple matches are made (because of repetition or multiple uses of the same text), then all of the matches will be saved in an array under the text.

Pattern Library

A pattern library is a record containing patterns that can be used by other patterns. The lib record in the first example is a pattern library.

A pattern library can contain patterns, texts, and functions.


When an ordinary text is used as a pattern, it will match identical text.


A function in a pattern library should have this signature:

ƒ text, from

A function will be passed the complete text being matched and the index of the character that is currently being considered. If the function does not match, then it must return null. If the function does find a match, it must return the number of characters that are matched.

When a function is called by a character class, then text will be the character under consideration, and from will be 0. It should return 1 or true if the character is in the class or null or false if the character is not in the class

Pattern Functions

Patterns can be used with some of the built-in functions.

array(text, pattern)

extract(text, pattern, from, to)

replace(text, pattern, replacement, limit)

search(text, pattern, from)

For example, the array() method can be used to break the contents of a file into an array of lines. If the separator pattern is found in the text, then the subtext before the match is stored in the new array, and the match is repeated on the remainder until no more matches are possible. The remainder of the text is stored in the array and the array is result.

var lines: array(file, ¶ (end))

This will break on linefeed, carriage return, and carriage return linefeed.