This note describes a new message format called Nota that was developed for use in the Actor Protocol. The Actor Protocol requires a JSON-like format that provides blobs (binary large objects) for things like public keys and encrypted secrets. The Actor Protocol also requires private actor addresses.
JSON had three design rules:
The textual rule was important in the early days of the web because people did not have much experience with network programming, and having the ability to directly see the data was obviously useful. The tooling is much better now, so the value in being able to read the raw streams is much lower. It also came with some downsides: People were tempted to construct JSON text by hand which was often problematic, and the text encoding is slightly inefficient.
The minimal rule is still vitally important.
Nota maintains JSON's design philosophy of being at the intersection of most programming languages and most data types. The representation of numbers is completely independent of obsolete number formats like IEEE 754. The representation of data structures with arrays and key:value pairs seems to be compatible with everything.
Nota departs from JSON by using counts instead of brackets. The representation of counts comes from Kim. Kim is also the representation of characters, being simpler and more compact than UTF-8.
Nota is not intended to replace JSON. It is intended for an application that JSON does not do well. JSON should continue doing the things it does well.
The key idea in Kim is of a continuation byte. A continuation byte contains a continue bit and 7 data bits. To process a continuation byte, shift the kim accumulator left by seven bits, and then add the 7 data bits to the kim accumulator. If the continue bit is 0, we are done. Otherwise, repeat with the next byte.
Every Nota value starts with a preamble byte that contains the 3 bit type of the value and additional information, depending on the type.
Most of the types provide 3 or 4 bits for storing some data: Make the kim encoding of the data. If the most significant byte contains no more than 3 or 4 bits, then those bits are encorporated into the preamble. If the kim was one byte long, then the C (continue) bit is turned off.
A blob is a string of bits. The data produces the number of bits. The number of bytes that follow:
floor((number of bits + 7) / 8)
The final byte is padded with
0 if necessary.
The data produces the number of characters in the text. Many Kim encoded characters follow. The ASCII characters are encoded in a single byte. The characters in the first quarter of the Basic Multilingual Plane are encoded in two bytes. All other Unicode characters, including characters from the extended planes, are encoded in three bytes. Unlike JSON, there is never a need for escapement.
"cat" 13 63 61 74 "" 10 "☃★♲" 13 CC 03 CC 05 CC 72 "𓂀𓃠𓅣𓂻𓂻𓂺𓁟𓂑𓃻𓇼𓊽𓂭𓎆𓍢𓏢𓐠" 90 10 84 E1 00 84 E1 60 84 E2 63 84 E1 3B 84 E1 3B 84 E1 3A 84 E0 5F 84 E1 11 84 E1 7B 84 E3 7C 84 E5 3D 84 E1 2D 84 E7 06 84 E6 62 84 E7 62 84 E8 20
An array is an ordered sequence of values. Following the preamble are the elements of the array. Each element begins with one of the preambles. As with JSON, nesting is encouraged.
A record is an unordered collection of key/value pairs. JSON called this structure an object, but that was confusing in a context where everything is an object, including everything that is not an object. Record is a more approapriate term because there are no methods involved here. In a record, the keys must be text, and they must be unique. The values can be any Nota values.
Nota floating point represents numbers as
coefficient * power(10, exponent).
It is similar to scientific notation except that the part before the
must be an integer.
The part after the
e (exponent) gives the number of positions to move the decimal point
(negative moves left, positive moves right) inserting
0 if necessary.
The preamble may contain the first three bits of the expontent. The preamble is followed by the continuation of the exponent (if there is one), followed by the coefficient.
The integer form should be used when the exponent is zero.
-1.01 5A 65 98.6 51 87 5A -0.5772156649 D8 0A 95 C0 B0 BD 69 -1.00000000000001 D8 0E 96 DE B1 83 E9 80 01 -10000000000000 C8 0D 01
The integers in the range -7 thru 7 can be represented by a single byte.
0 60 2023 E0 8F 67 -1 69
Currently there are only four symbols. All other symbols are reserved.
The first two symbols are the logical constants
The other two symbols are special prefixes.
private prefix must be followed by a record containing a private actor address.
system prefix must be followed by a record containing a system message.
false 70 true 71 private 78 system 79
Nota is short for notation. Nota does not stand for not a or none of the above.