2023-05-23

Trees and Graphs

Computer programming benefits from the two fundamental data structures: records and arrays.

Records are collections of key:value pairs. They go by many names, such as object, dictionary, struct, and hashtable. They can be static or dynamic. They can be classical or casual. The central idea is that a name is used to select a value. It is common to use .period as the refinement operator that finds a field in a record.

Arrays are linear collections of values that are paired with their ordinal, or position within the linear collection. They are sometimes called lists. They can have a fixed size or a variable size. The ordinals can start with 0 or 1 or sometimes another number.

More complex data structures can be obtained by allowing records and arrays to nest. In these structures, each record or array that contains (or points to) another record or array is called a node. A record or array that does not contain (or point to) another record or array is called a leaf. Perhaps the most popular class of complex data structures is called trees. Trees are excellent at representing hierarchical data. Trees must abide by these rules:

The outermost node is called the root. No node in the tree may point to the root.
Every other record or array in the tree must be pointed to by exactly one node.

The other class of complex data structures is called graphs. A graph is like a tree except that there are no rules. This allows for loops, recurrences, and continuations. Graphs can get very complex.

The JSON data interchange format is very good at representing trees. Whilst I was designing it, I considered adding graphs. The way it would work was that you assign a number, starting with 0, to each {left brace and [left bracket in the JSON text. A graph value is represented by ^circumflex followed by the the number of a substructure that had been encountered earlier. I decided to reject this feature because it violated the JSON design rules:

Minimal
Textual
A subset of JavaScript

Later, I wrote a pair of functions [cycle.js] that could convert graphs into trees and back again. To illustrate, this is how to make the simplest graph:

const a = [];    // make an array
a[0] = a;        // the first element of the array is the array

Ordinarily, JSON can not represent this graph. However, my rejected design could have:

[^0]

[cycle.js] can represent it like this:

[{$ref: "$"}]

The array contains an object containing a $ref field. The $ref indicates that this object represents a graph pointer. The value of the $ref field is a string containing the path of an object or array in the graph using a subset of JSONPath. $dollar sign represents the root.

I used to think about making a new format for graphs, supporting trees as a subset. I now think that would be a bad idea. Trees are easy to traverse. Graphs can be much harder to traverse if you do not already understand the specific structure because you can easily get caught in a loop. Without that prior knowledge, graphs can be confusing, and confusion can lead to bugs and security exploits. Ultimately, every graph can easily be transformed into a tree and back again. Making that encoding step explicit is better than the risk of feeding graphs to programs expecting trees.

The Queue Metaphor: List or Beast?