Douglas Crockford

Blog

Books

Videos

2024 Appearances

JavaScript

Misty

JSLint

JSON

Github

Electric Communities

Mastodon/Layer8

Flickr Photo Album

ResearchGate

LinkedIn

Pronouns: pe/per

About

Nota Message Format

This note describes a new message format called Nota that was developed for use in the Procession Protocol. The Procession Protocol requires a JSON-like format that provides blobs (binary large objects) for things like public keys and encrypted secrets. The Procession Protocol also requires private process addresses.

JSON had three design rules:

  1. Minimal
  2. Textual
  3. Subset of JavaScript

The textual rule was important in the early days of the web because people did not have much experience with network programming, and having the ability to directly see the data was obviously useful. The tooling is much better now, so the value in being able to read the raw streams is much lower. It also came with some downsides: People were tempted to construct JSON text by hand which was often problematic, and the text encoding is slightly inefficient.

The Subset of JavaScript rule (which gave JSON its name) was expedient because it allowed the JavaScript parser to act as the first JSON parser. A close association with JavaScript is no longer desirable.

The minimal rule is still vitally important.

Nota maintains JSON's design philosophy of being at the intersection of most programming languages and most data types. The representation of numbers is completely independent of problematic number formats like IEEE 754. The representation of data structures with arrays and key:value pairs seems to be compatible with everything.

Nota departs from JSON by using counts instead of brackets. The representation of counts comes from Kim. Kim is also the representation of characters, being simpler and more compact than UTF-8.

Nota is not intended to replace JSON. It is intended for an application that JSON does not do well. JSON should continue doing the things it does well.

Continuation

Continuation
C D D D D D D D
C: continue
DDDDDDD: seven data bits

The key idea in Kim is of a continuation byte. A continuation byte contains a continue bit and 7 data bits. To process a continuation byte, shift the kim accumulator left by seven bits, and then add the 7 data bits to the kim accumulator. If the continue bit is 0, we are done. Otherwise, repeat with the next byte.

Preambles

Every Nota value starts with a preamble byte that contains the 3 bit type of the value and additional information, depending on the type.

Most of the types provide 3 or 4 bits for storing some data: Make the kim encoding of the data. If the most significant byte contains no more than 3 or 4 bits, then those bits are encorporated into the preamble. If the kim requires no additional bits, then the C (continue) bit is turned off. Otherwise, the C bit is turned on and the continuation of the kim follows.

Blob

Blob
C 0 0 0 D D D D
C: continue the number of bits
DDDD: the number of bits

A blob is a string of bits. The data produces the number of bits. The number of bytes that follow:

floor((number of bits + 7) / 8)

The final byte is padded with 0 if necessary. The length of a blob can be represented in 4 bits (for a byte-sized blob), or 11 bits, or 18 bits, or 25 bits, or 32 bits, or 39 bits...

Text

Text
C 0 0 1 D D D D
C: continue the number of characters
DDDD: the number of characters

The data produces the number of characters in the text. Many Kim encoded characters follow. The ASCII characters are encoded in a single byte. The characters in the first quarter of the Basic Multilingual Plane are encoded in two bytes. All other Unicode characters, including characters from the extended planes, are encoded in three bytes. Unlike JSON, there is never a need for escapement.

Examples:

"cat"                   13 63 61 74
""                      10
"β˜ƒβ˜…β™²"                  13 CC 03 CC 05 CC 72
"π“‚€π“ƒ π“…£π“‚»π“‚»π“‚Ίπ“Ÿπ“‚‘π“ƒ»π“‡Όπ“Š½π“‚­π“Ž†π“’π“’π“ " 90 10 84 E1 00 84 E1 60 84 E2 63 84 E1 3B 84 E1 3B 84 E1 3A
                        84 E0 5F 84 E1 11 84 E1 7B 84 E3 7C 84 E5 3D 84 E1 2D 84 E7
                        06 84 E6 62 84 E7 62 84 E8 20

Array

Array
C 0 1 0 D D D D
C: continue the number of elements
DDDD: the number of elements

An array is an ordered sequence of values. Following the preamble are the elements of the array. Each element begins with one of the preambles. As with JSON, nesting is encouraged.

Record

Record
C 0 1 1 D D D D
C: continue the number of pairs
DDDD: number of pairs

A record is an unordered collection of key/value pairs. JSON called this structure an object, but that was confusing in a context where everything is an object, including everything that is not an object. Record is a more appropriate term because there are no methods involved here. In a record, the keys must be text, and they must be unique within the record. The values can be any Nota values.

Floating Point

Floating Point
C 1 0 E S D D D
C: continue the exponent
E: sign of the exponent
S: sign of the coefficient
DDD: three bits of the exponent

Nota floating point represents numbers as coefficient * power(10, exponent). It is similar to scientific notation except that the part before the e (coefficient) must be an integer. The part after the e (exponent) gives the number of positions to move the decimal point (negative moves left, positive moves right) inserting 0 if necessary.

The preamble may contain the first three bits of the exponent. The preamble is followed by the continuation of the exponent (if there is one), followed by the coefficient.

The integer type should be used when the exponent is zero.

Example:

-1.01              5A 65
98.6               51 87 5A
-0.5772156649      D8 0A 95 C0 B0 BD 69
-1.00000000000001  D8 0E 96 DE B1 83 E9 80 01
-10000000000000    C8 0D 01

Integer

Integer
C 1 1 0 S D D D
C: continue the integer
S: sign
DDD: three bits of the integer

The integers in the range -7 thru 7 can be represented by a single byte. The integers in the range -1023 thru 1023 can be represented by two bytes. The integers in the range -131071 thru 131071 can be represented by three bytes.

Examples:

0     60
2023  E0 8F 67
-1    69

Symbol

Symbol
0 1 1 1 D D D D
DDDD: the symbol

Currently there are only four symbols. All other symbols are reserved.

The first two symbols are the logical constants true and false.

The other two symbols are special prefixes. The private prefix must be followed by a record containing a private process address. The system prefix must be followed by a record containing a system message.

false    70
true     71
private  78
system   79

Nota is short for notation. Nota does not stand for not a or none of the above.