This note describes a message format called Nota that was developed for use in Misty's Procession Protocol. The Procession Protocol requires a JSON-like format that provides blobs (binary large objects) for things like public keys and encrypted secrets. The Procession Protocol also requires private process addresses.
JSON had three design rules:
The textual rule was important in the early days of the web because people did not have much experience with network programming, and having the ability to directly see the data was obviously useful. The tooling is much better now, so the value in being able to read the raw streams is much lower. It also came with some downsides: People were tempted to construct JSON text by hand which was often problematic, and the text encoding is slightly inefficient.
The Subset of JavaScript rule (which gave JSON its name) was expedient because it allowed the JavaScript parser to act as the first JSON parser. A close association with JavaScript is no longer desirable.
The minimal rule is still vitally important.
Nota maintains JSON's design philosophy of being at the intersection of most programming languages and most data types. The representation of numbers is completely independent of problematic number formats like IEEE 754. The representation of data structures with arrays and key:value pairs seems to be compatible with everything.
Nota departs from JSON by using counts instead of brackets. The representation of counts comes from Kim. Kim is also the representation of characters, being simpler and more compact than UTF-8.
Nota is not intended to replace JSON. It is intended for an application that JSON does not do well. JSON should continue doing the things it does well.
Nota stands for Network Object Transfer Arrangement. Nota is short for notation. Nota does not stand for not a or none of the above.
Continuation | |||||||
---|---|---|---|---|---|---|---|
C | D | D | D | D | D | D | D |
The key idea in Kim is of a continuation byte. A continuation byte contains a continue bit and 7 data bits. To process a continuation byte, shift the kim accumulator left by seven bits, and then add the 7 data bits to the kim accumulator. If the continue bit is 0, we are done. Otherwise, repeat with the next byte.
Bits | Type |
---|---|
000 | Blob |
001 | Text |
010 | Array |
011 | Record |
100 | Floating Point (positive exponent) |
101 | Floating Point (negative exponent) |
110 | Integer (zero exponent) |
111 | Symbol |
Every Nota value starts with a preamble byte that is a Kim value with the three most significant bits used for type information.
Most of the types provide 3 or 4 bits for storing some data: Make the kim encoding of the data. If the most significant byte contains no more than 3 or 4 bits, then those bits are encorporated into the preamble. If the kim requires no additional bits, then the C (continue) bit is turned off. Otherwise, the C bit is turned on and the continuation of the kim follows.
Blob | |||||||
---|---|---|---|---|---|---|---|
C | 0 | 0 | 0 | D | D | D | D |
A blob is a string of bits. The data produces the number of bits. The number of bytes that follow:
floor((number_of_bits + 7) / 8)
The final byte is padded with 0
if necessary. The length of a blob can be represented in 4 bits
(for a byte-sized blob), or 11 bits, or 18 bits, or 25 bits, or 32 bits, or 39 bits...
Example: A blob containing 25 bits: 1111000011100011001000001
10000000 00011001 11110000 11100011 00100000 1000000080 19 F0 E3 20 80
Text | |||||||
---|---|---|---|---|---|---|---|
C | 0 | 0 | 1 | D | D | D | D |
The data produces the number of characters in the text. Many Kim encoded characters follow. The ASCII characters are encoded in a single byte. The characters in the first quarter of the Basic Multilingual Plane are encoded in two bytes. All other Unicode characters, including characters from the extended planes, are encoded in three bytes. Unlike JSON, there is never a need for escapement.
Examples:
""10"cat"13 63 61 74"U+1F4A9 「うんち絵文字」 «💩»"90 14 55 2B 31 46 34 41 39 20 E0 0C E0 46 E1 13 E0 61 81 FA 75 81 CB 07 81 B6 57 E0 0D 20 81 2B 87 E9 29 81 3B
Array | |||||||
---|---|---|---|---|---|---|---|
C | 0 | 1 | 0 | D | D | D | D |
An array is an ordered sequence of values. Following the preamble are the elements of the array. Each element begins with one of the preambles. As with JSON, nesting is encouraged.
Record | |||||||
---|---|---|---|---|---|---|---|
C | 0 | 1 | 1 | D | D | D | D |
A record is an unordered collection of key/value pairs. JSON called this structure an object, but that was confusing in a context where everything is an object, including everything that is not an object. Record is a more appropriate term because there are no methods involved here. In a record, the keys must be text, and they must be unique within the record. The values can be any Nota values.
Floating Point | |||||||
---|---|---|---|---|---|---|---|
C | 1 | 0 | E | S | D | D | D |
Nota floating point represents numbers as coefficient * power(10, exponent)
.
It is similar to scientific notation except that the part before the e
(coefficient)
must be an integer.
The part after the e
(exponent) gives the number of positions to move the decimal point
(negative moves left, positive moves right) inserting 0
as necessary.
The preamble may contain the first three bits of the exponent. The preamble is followed by the continuation of the exponent (if there is one), followed by the coefficient.
The integer type should be used when the exponent is zero.
Examples:
-1.015A 6598.651 87 5A-0.5772156649D8 0A 95 C0 B0 BD 69-1.00000000000001D8 0E 96 DE B1 83 E9 80 01-10000000000000C8 0D 01
Integer | |||||||
---|---|---|---|---|---|---|---|
C | 1 | 1 | 0 | S | D | D | D |
The integers in the range -7 thru 7 can be represented by a single byte. The integers in the range -1023 thru 1023 can be represented by two bytes. The integers in the range -131071 thru 131071 can be represented by three bytes.
Examples:
0602023E0 8F 67-169
Symbol | |||||||
---|---|---|---|---|---|---|---|
0 | 1 | 1 | 1 | D | D | D | D |
Currently there are only five symbols. All other symbols are reserved.
The first two symbols are the logical constants true
and false
.
The other two symbols are special prefixes.
The private
prefix must be followed by a record containing a private process address.
The system
prefix must be followed by a record containing a system message.
null70false72true73private78system79