forked from syndicate-lang/preserves
398 lines
12 KiB
Markdown
398 lines
12 KiB
Markdown
---
|
|
no_site_title: true
|
|
title: "Preserves: a tutorial"
|
|
---
|
|
|
|
By Christopher Lemmer Webber and Tony Garnock-Jones
|
|
August 2019.
|
|
|
|
*This document, like Preserves itself, is released under*
|
|
*[version 2.0 of the Apache license](./LICENSE).*
|
|
|
|
|
|
<a id="org74bd8ba"></a>
|
|
|
|
# Overview
|
|
|
|
Preserves is a serialization system which supplies both a
|
|
human-readable textual and efficient binary syntax; converting between
|
|
the two is straightforward.
|
|
Preserves' human readable syntax is easy to read and should be mostly
|
|
familiar if you already know systems like JSON.
|
|
However, Preserves is more precisely specified than JSON, and also
|
|
has a clean extension mechanism.
|
|
|
|
This document is a tutorial; it does not get into all the details of
|
|
Preserves.
|
|
For that, see the [Preserves specification](preserves.html).
|
|
|
|
|
|
<a id="org0a73f91"></a>
|
|
|
|
# Preserves basics
|
|
|
|
|
|
<a id="orge8d9054"></a>
|
|
|
|
## Starting with the familiar
|
|
|
|
If you're familiar with JSON, Preserves looks fairly similar:
|
|
|
|
``` javascript
|
|
{"name": "Missy Rose",
|
|
"species": "Felis Catus",
|
|
"age": 13,
|
|
"foods": ["kibble", "cat treats", "tinned meat"]}
|
|
```
|
|
|
|
Preserves also has something we can use for debugging/development
|
|
information called "annotations"; they aren't actually read in as data
|
|
but we can use them for comments.
|
|
(They can also be used for other development tools and are not
|
|
restricted to strings; more on this later, but for now interpret them
|
|
as comments.)
|
|
|
|
``` javascript
|
|
@"I'm an annotation... basically a comment. Ignore me!"
|
|
"I'm data! Don't ignore me!"
|
|
```
|
|
|
|
Preserves supports some data types you're probably already familiar
|
|
with from JSON, and which look fairly similar in the textual format:
|
|
|
|
``` javascript
|
|
@"booleans"
|
|
#true
|
|
#false
|
|
|
|
@"various kinds of numbers:"
|
|
42
|
|
123556789012345678901234567890
|
|
-10
|
|
13.5
|
|
|
|
@"strings"
|
|
"I'm feeling stringy!"
|
|
|
|
@"sequences (lists)"
|
|
["cat", "dog", "mouse", "goldfish"]
|
|
|
|
@"dictionaries (hashmaps)"
|
|
{"cat": "meow",
|
|
"dog": "woof",
|
|
"goldfish": "glub glub",
|
|
"mouse": "squeak"}
|
|
```
|
|
|
|
|
|
<a id="org270f7f4"></a>
|
|
|
|
## Going beyond JSON
|
|
|
|
We can observe a few differences from JSON already; it's possible to
|
|
express numbers of arbitrary length in Preserves, and booleans look a little
|
|
bit different.
|
|
A few more interesting differences:
|
|
|
|
``` javascript
|
|
@"Preserves treats commas as whitespace, so these are the same"
|
|
["cat", "dog", "mouse", "goldfish"]
|
|
["cat" "dog" "mouse" "goldfish"]
|
|
|
|
@"We can use anything as keys in dictionaries, not just strings"
|
|
{1: "the loneliest number",
|
|
["why", "was", 6, "afraid", "of", 7]: "because 7 8 9",
|
|
{"dictionaries": "as keys???"}: "well, why not?"}
|
|
```
|
|
|
|
Preserves technically provides a few types of numbers:
|
|
|
|
``` javascript
|
|
@"Signed Integers"
|
|
42
|
|
-42
|
|
5907212309572059846509324862304968273468909473609826340
|
|
-5907212309572059846509324862304968273468909473609826340
|
|
|
|
@"Floats (Single-precision IEEE floats) (notice the trailing f)"
|
|
3.1415927f
|
|
|
|
@"Doubles (Double-precision IEEE floats)"
|
|
3.141592653589793
|
|
```
|
|
|
|
Preserves also provides some types that don't come in JSON.
|
|
`Symbols` are fairly interesting; they look a lot like strings but
|
|
really aren't meant to represent text as much as they are, well… a
|
|
symbolic name.
|
|
Often they're meant to be used for something that has symbolic importance
|
|
to the program, but not textual importance (other than to guide the
|
|
programmer… not unlike variable names).
|
|
|
|
``` javascript
|
|
@"A symbol (NOT a string!)"
|
|
JustASymbol
|
|
|
|
@"You can do mixedCase or CamelCase too of course, pick your poison"
|
|
@"(but be consistent, for the sake of your collaborators!"
|
|
iAmASymbol
|
|
i-am-a-symbol
|
|
|
|
@"A list of symbols"
|
|
[GET, PUT, POST, DELETE]
|
|
|
|
@"A symbol with spaces in it"
|
|
|this is just one symbol believe it or not|
|
|
```
|
|
|
|
We can also add binary data, aka ByteStrings:
|
|
|
|
``` javascript
|
|
@"Some binary data, base64 encoded"
|
|
#base64{cGljdHVyZSBvZiBhIGNhdA==}
|
|
|
|
@"Some other binary data, hexadecimal encoded"
|
|
#hex{616263}
|
|
|
|
@"Same binary data as above, base64 encoded"
|
|
#base64{YWJj}
|
|
```
|
|
|
|
What's neat about this is that we don't have to "pay the cost" of
|
|
base64 or hexadecimal encoding when we serialize this data to binary;
|
|
the length of the binary data is the length of the binary data.
|
|
|
|
Conveniently, Preserves also includes Sets, which are collections of
|
|
unique elements where ordering of items is unimportant.
|
|
|
|
``` javascript
|
|
#set{flour, salt, water}
|
|
```
|
|
|
|
<a id="orgefafe56"></a>
|
|
|
|
## Total ordering and canonicalization
|
|
|
|
This is a good time to mention that even though from a semantic
|
|
perspective sets and dictionaries do not carry information about the
|
|
ordering of their elements (and Preserves doesn't care what order we
|
|
enter them in for our hand-written-as-text Preserves documents),
|
|
Preserves has a well-defined "total ordering".
|
|
|
|
**FULL WARNING:** the following claim is not implemented yet. :)
|
|
Coming soon!
|
|
|
|
Based on this total ordering, Preserves provides support for canonical
|
|
ordering when serializing; in this mode, Preserves will always write
|
|
out the elements in the same order, every time.
|
|
When combined with binary serialization, this is Preserves' "canonical
|
|
form".
|
|
This is important and useful for many contexts, but especially for
|
|
cryptographic signatures and hashing.
|
|
|
|
``` javascript
|
|
@"This hand-typed Preserves document..."
|
|
{monkey: {"noise": "ooh-ooh",
|
|
"eats": #set{"bananas", "berries"}}
|
|
cat: {"noise": "meow",
|
|
"eats": #set{"kibble", "cat treats", "tinned meat"}}}
|
|
|
|
@"Will always, always be written out in this order when canonicalized:"
|
|
{cat: {"eats": #set{"cat treats", "kibble", "tinned meat"},
|
|
"noise": "meow"}
|
|
monkey: {"eats": #set{"bananas", "berries"},
|
|
"noise": "ooh-ooh"}}
|
|
```
|
|
|
|
Clever implementations can get canonicalized output for free by
|
|
carefully ordering set elements and dictionary entries at construction
|
|
time, but even in simple implementations, canonical serialization is
|
|
almost as cheap as normal serialization.
|
|
|
|
|
|
<a id="org0366627"></a>
|
|
|
|
## Defining our own types using Records
|
|
|
|
Finally, there is one more type that Preserves provides… but in a
|
|
sense, it's a meta-type.
|
|
`Record` objects have a label and a series of arguments (or "fields").
|
|
For example, we can make a `Date` record:
|
|
|
|
``` javascript
|
|
<Date 2019 8 15>
|
|
```
|
|
|
|
In this example, the `Date` label is a symbol; 2019, 8, and 15 are the
|
|
year, month, and day fields respectively.
|
|
|
|
Why do we care about this?
|
|
We could instead just decide to encode our date data in a string,
|
|
like "2019-08-15".
|
|
A document using such a date structure might look like so:
|
|
|
|
``` javascript
|
|
{"name": "Gregor Samsa",
|
|
"description": "humanoid trapped in an insect body",
|
|
"born": "1915-10-04"}
|
|
```
|
|
|
|
Unfortunately, say our boss comes along and tells us that the people
|
|
doing data entry have complained that it isn't always possible to get
|
|
an exact date.
|
|
They would like to be able to type in what they know if they don't
|
|
know the date exactly.
|
|
|
|
This causes a problem.
|
|
Now we might have two kinds of entries:
|
|
|
|
``` javascript
|
|
@"Exact date known"
|
|
{"name": "Gregor Samsa",
|
|
"description": "humanoid trapped in an insect body",
|
|
"born": "1915-10-04"}
|
|
|
|
@"Not sure about exact date..."
|
|
{"name": "Gregor Samsa",
|
|
"description": "humanoid trapped in an insect body",
|
|
"born": "Sometime in October 1915? Or was that when he became an insect?"}
|
|
```
|
|
|
|
This is a mess.
|
|
We *could* just try parsing a regular expression to see if it "looks
|
|
like a date", but doing this kind of thing is prone to errors and weird
|
|
edge cases.
|
|
No, it's better to be able to have a separate type:
|
|
|
|
``` javascript
|
|
@"Exact date known"
|
|
{"name": "Gregor Samsa",
|
|
"description": "humanoid trapped in an insect body",
|
|
"born": <Date 1915 10 04>}
|
|
|
|
@"Not sure about exact date..."
|
|
{"name": "Gregor Samsa",
|
|
"description": "humanoid trapped in an insect body",
|
|
"born": <Unknown "Sometime in October 1915? Or was that when he became an insect?">}
|
|
```
|
|
|
|
Now we can distinguish the two.
|
|
|
|
We can make as many Record types as our program needs, though it is up
|
|
to our program to make sense of what these mean.
|
|
Since Preserves does not specify the `Date` itself, both the program
|
|
(or person) writing the Preserves document and the program reading it
|
|
need to have a mutual understanding of how many fields it has and what
|
|
the meaning the label signifies for it to be of use.
|
|
|
|
Still, there are plenty of interesting labels we can define.
|
|
Here is one for an "iri", a hyperlink:
|
|
|
|
``` javascript
|
|
<iri "https://dustycloud.org/blog/">
|
|
```
|
|
|
|
That's nice enough, but here's another interesting detail… labels on
|
|
Records are usually symbols but aren't necessarily so.
|
|
They can also be strings or numbers or even dictionaries.
|
|
And very interestingly, they can also be other records:
|
|
|
|
``` javascript
|
|
<<iri "https://www.w3.org/ns/activitystreams#Note">
|
|
{"to": [<iri "https://chatty.example/ben/">],
|
|
"attributedTo": <iri "https://social.example/alyssa/">,
|
|
"content": "Say, did you finish reading that book I lent you?"}>
|
|
```
|
|
|
|
Do you see it? This Record's label is… an `iri` Record!
|
|
The link here points to a more precise term saying that "this is a
|
|
note meant to be sent around in social networks".
|
|
It is considerably more precise than just using the string or symbol
|
|
"Note", which could be ambiguous.
|
|
(A social networking note? A footnote? A music note?)
|
|
While not all systems need this, this (partial) example hints at how
|
|
Preserves can also be used to coordinate meaning in larger, more
|
|
decentralized systems.
|
|
|
|
Likewise, it is also possible to annotate records with integers.
|
|
Languages like OCaml use integers instead of symbolic record labels
|
|
because their type systems ensure that it is never ambiguous what,
|
|
say, the label `23` means in any given context.
|
|
Allowing integer record labels lets Preserves directly express OCaml
|
|
data.
|
|
|
|
|
|
<a id="org1b72b96"></a>
|
|
|
|
## Annotations
|
|
|
|
Annotations are not strictly a necessary feature, but they are useful
|
|
in some circumstances.
|
|
We have previously shown them used as comments:
|
|
|
|
``` javascript
|
|
@"I'm a comment!"
|
|
"I am not a comment, I am data!"
|
|
```
|
|
|
|
Annotations annotate the values they precede.
|
|
It is possible to have multiple annotations on a value.
|
|
|
|
``` javascript
|
|
@"I am annotating this number"
|
|
@"And so am I!"
|
|
42
|
|
```
|
|
|
|
As said, annotations are not really data.
|
|
They are merely meant for development tooling or debugging.
|
|
You have to explicitly ask for them when reading, and they wrap all
|
|
the values.
|
|
Many implementations will, in the same mode, also supply line number
|
|
and column information attached to each read value.
|
|
|
|
So what's the point of them then?
|
|
If annotations were just for comments, there would be indeed hardly
|
|
point at all… it would be simpler to just provide a comment syntax.
|
|
|
|
However, annotations can be used for more than just comments.
|
|
They can also be used for debugging or other development-tool-oriented
|
|
data.
|
|
|
|
For instance, here's a reply from an HTTP API service running in
|
|
"debug" mode annotated with the time it took to produce the reply and
|
|
the internal name of the server that produced the response:
|
|
|
|
``` javascript
|
|
@<ResponseTime <Milliseconds 64.4>>
|
|
@<BackendServer "humpty-dumpty.example.com">
|
|
<Success
|
|
<Employees [
|
|
<Employee "Alyssa P. Hacker" #set{<Role Programmer>, <Role Manager>}, <Date 2018, 1, 24>>
|
|
<Employee "Ben Bitdiddle" #set{<Role Programmer>}, <Date 2019, 2, 13>> ]>>
|
|
```
|
|
|
|
The annotations aren't related to the data requested, which is all
|
|
about "employees"; instead, they're about the systems that produced
|
|
the response.
|
|
You could say they're in the domain of "debugging" instead of the
|
|
domain of "employees".
|
|
|
|
|
|
<a id="org1924a0a"></a>
|
|
|
|
# Conclusions
|
|
|
|
We've covered the broad strokes of Preserves, but not everything that
|
|
is possible with it.
|
|
We leave it as an exercise to the reader to try reading these examples
|
|
into their languages (several libraries exist already) and writing them
|
|
out as binary objects.
|
|
|
|
But as we've seen, Preserves is a flexible system which comes with
|
|
well-defined, carefully specified built-in types, as well as a
|
|
meta-type which can be used as an extension point.
|
|
|
|
Happy preserving!
|
|
|