preserves/TUTORIAL.md

394 lines
12 KiB
Markdown

---
no_site_title: true
title: "Preserves: a tutorial"
---
By Christopher Lemmer Webber and Tony Garnock-Jones
August 2019.
*This document, like Preserves itself, is released under*
*[version 2.0 of the Apache license](./LICENSE).*
<a id="org74bd8ba"></a>
# Overview
Preserves is a serialization system which supplies both a
human-readable textual and efficient binary syntax; converting between
the two is straightforward.
Preserves' human readable syntax is easy to read and should be mostly
familiar if you already know systems like JSON.
However, Preserves is more precisely specified than JSON, and also
has a clean extension mechanism.
This document is a tutorial; it does not get into all the details of
Preserves.
For that, see the [Preserves specification](preserves.html).
<a id="org0a73f91"></a>
# Preserves basics
<a id="orge8d9054"></a>
## Starting with the familiar
If you're familiar with JSON, Preserves looks fairly similar:
```
{"name": "Missy Rose",
"species": "Felis Catus",
"age": 13,
"foods": ["kibble", "cat treats", "tinned meat"]}
```
Preserves also has something we can use for debugging/development
information called "annotations"; they aren't actually read in as data
but we can use them for comments.
(They can also be used for other development tools and are not
restricted to strings; more on this later, but for now, we will stick
to the special comment annotation syntax.)
```
;I'm an annotation... basically a comment. Ignore me!
"I'm data! Don't ignore me!"
```
Preserves supports some data types you're probably already familiar
with from JSON, and which look fairly similar in the textual format:
```
;booleans
#t
#f
;various kinds of numbers:
42
123556789012345678901234567890
-10
13.5
;strings
"I'm feeling stringy!"
;sequences (lists)
["cat", "dog", "mouse", "goldfish"]
;dictionaries (hashmaps)
{"cat": "meow",
"dog": "woof",
"goldfish": "glub glub",
"mouse": "squeak"}
```
<a id="org270f7f4"></a>
## Going beyond JSON
We can observe a few differences from JSON already; it's possible to
*reliably* express integers of arbitrary length in Preserves, and booleans look a little
bit different.
A few more interesting differences:
```
;Preserves treats commas as whitespace, so these are the same
["cat", "dog", "mouse", "goldfish"]
["cat" "dog" "mouse" "goldfish"]
;We can use anything as keys in dictionaries, not just strings
{1: "the loneliest number",
["why", "was", 6, "afraid", "of", 7]: "because 7 8 9",
{"dictionaries": "as keys???"}: "well, why not?"}
```
Preserves technically provides a few types of numbers:
```
;Signed Integers
42
-42
5907212309572059846509324862304968273468909473609826340
-5907212309572059846509324862304968273468909473609826340
;Floats (Single-precision IEEE floats) (notice the trailing f)
3.1415927f
;Doubles (Double-precision IEEE floats)
3.141592653589793
```
Preserves also provides some types that don't come in JSON.
`Symbols` are fairly interesting; they look a lot like strings but
really aren't meant to represent text as much as they are, well&#x2026; a
symbolic name.
Often they're meant to be used for something that has symbolic importance
to the program, but not textual importance (other than to guide the
programmer&#x2026; not unlike variable names).
```
;A symbol (NOT a string!)
JustASymbol
;You can do mixedCase or CamelCase too of course, pick your poison
;(but be consistent, for the sake of your collaborators!)
iAmASymbol
i-am-a-symbol
;A list of symbols
[GET, PUT, POST, DELETE]
;A symbol with spaces in it
|this is just one symbol believe it or not|
```
We can also add binary data, aka ByteStrings:
```
;Some binary data, base64 encoded
#[cGljdHVyZSBvZiBhIGNhdA==]
;Some other binary data, hexadecimal encoded
#x"616263"
;Same binary data as above, base64 encoded
#[YWJj]
```
What's neat about this is that we don't have to "pay the cost" of
base64 or hexadecimal encoding when we serialize this data to binary;
the length of the binary data is the length of the binary data.
Conveniently, Preserves also includes Sets, which are collections of
unique elements where ordering of items is unimportant.
```
#{flour, salt, water}
```
<a id="orgefafe56"></a>
## Canonicalization
This is a good time to mention that even though from a semantic
perspective sets and dictionaries do not carry information about the
ordering of their elements (and Preserves doesn't care what order we
enter them in for our hand-written-as-text Preserves documents),
[Preserves provides support for canonical ordering](canonical-binary.html)
when serializing.
In canonicalizing output mode, Preserves will always write out a given
value using exactly the same bytes, every time. This is important and
useful for many contexts, but especially for cryptographic signatures
and hashing.
```
;This hand-typed Preserves document...
{monkey: {"noise": "ooh-ooh",
"eats": #{"bananas", "berries"}}
cat: {"noise": "meow",
"eats": #{"kibble", "cat treats", "tinned meat"}}}
;Will always, always be written out in this order (except in
;binary, of course) when canonicalized:
{cat: {"eats": #{"cat treats", "kibble", "tinned meat"},
"noise": "meow"}
monkey: {"eats": #{"bananas", "berries"},
"noise": "ooh-ooh"}}
```
<a id="org0366627"></a>
## Defining our own types using Records
Finally, there is one more type that Preserves provides&#x2026; but in a
sense, it's a meta-type.
`Record` objects have a label and a series of arguments (or "fields").
For example, we can make a `Date` record:
```
<Date 2019 8 15>
```
In this example, the `Date` label is a symbol; 2019, 8, and 15 are the
year, month, and day fields respectively.
Why do we care about this?
We could instead just decide to encode our date data in a string,
like "2019-08-15".
A document using such a date structure might look like so:
```
{"name": "Gregor Samsa",
"description": "humanoid trapped in an insect body",
"born": "1915-10-04"}
```
Unfortunately, say our boss comes along and tells us that the people
doing data entry have complained that it isn't always possible to get
an exact date.
They would like to be able to type in what they know if they don't
know the date exactly.
This causes a problem.
Now we might have two kinds of entries:
```
;Exact date known
{"name": "Gregor Samsa",
"description": "humanoid trapped in an insect body",
"born": "1915-10-04"}
;Not sure about exact date...
{"name": "Gregor Samsa",
"description": "humanoid trapped in an insect body",
"born": "Sometime in October 1915? Or was that when he became an insect?"}
```
This is a mess.
We *could* just try parsing a regular expression to see if it "looks
like a date", but doing this kind of thing is prone to errors and weird
edge cases.
No, it's better to be able to have a separate type:
```
;Exact date known
{"name": "Gregor Samsa",
"description": "humanoid trapped in an insect body",
"born": <Date 1915 10 04>}
;Not sure about exact date...
{"name": "Gregor Samsa",
"description": "humanoid trapped in an insect body",
"born": <Unknown "Sometime in October 1915? Or was that when he became an insect?">}
```
Now we can distinguish the two.
We can make as many Record types as our program needs, though it is up
to our program to make sense of what these mean.
Since Preserves does not specify the `Date` itself, both the program
(or person) writing the Preserves document and the program reading it
need to have a mutual understanding of how many fields it has and what
the meaning the label signifies for it to be of use.
Still, there are plenty of interesting labels we can define.
Here is one for an "iri", a hyperlink:
```
<iri "https://dustycloud.org/blog/">
```
That's nice enough, but here's another interesting detail&#x2026; labels on
Records are usually symbols but aren't necessarily so.
They can also be strings or numbers or even dictionaries.
And very interestingly, they can also be other records:
```
< <iri "https://www.w3.org/ns/activitystreams#Note">
{"to": [<iri "https://chatty.example/ben/">],
"attributedTo": <iri "https://social.example/alyssa/">,
"content": "Say, did you finish reading that book I lent you?"} >
```
Do you see it? This Record's label is&#x2026; an `iri` Record!
The link here points to a more precise term saying that "this is a
note meant to be sent around in social networks".
It is considerably more precise than just using the string or symbol
"Note", which could be ambiguous.
(A social networking note? A footnote? A music note?)
While not all systems need this, this (partial) example hints at how
Preserves can also be used to coordinate meaning in larger, more
decentralized systems.
Likewise, it is also possible to annotate records with integers.
Languages like OCaml use integers instead of symbolic record labels
because their type systems ensure that it is never ambiguous what,
say, the label `23` means in any given context.
Allowing integer record labels lets Preserves directly express OCaml
data.
<a id="org1b72b96"></a>
## Annotations
Annotations are not strictly a necessary feature, but they are useful
in some circumstances.
We have previously shown them used as comments:
```
;I'm a comment!
"I am not a comment, I am data!"
```
Annotations annotate the values they precede.
It is possible to have multiple annotations on a value.
The `;`-based comment syntax is syntactic sugar for the general
`@`-prefixed string annotation syntax.
```
;I am annotating this number
@"And so am I!"
42
```
As said, annotations are not really data.
They are merely meant for development tooling or debugging.
You have to explicitly ask for them when reading, and they wrap all
the values.
Many implementations will, in the same mode, also supply line number
and column information attached to each read value.
So what's the point of them then?
If annotations were just for comments, there would be indeed hardly any
point at all&#x2026; it would be simpler to just provide a comment syntax.
However, annotations can be used for more than just comments.
They can also be used for debugging or other development-tool-oriented
data.
For instance, here's a reply from an HTTP API service running in
"debug" mode annotated with the time it took to produce the reply and
the internal name of the server that produced the response:
```
@<ResponseTime <Milliseconds 64.4>>
@<BackendServer "humpty-dumpty.example.com">
<Success
<Employees [
<Employee "Alyssa P. Hacker"
#{<Role Programmer>, <Role Manager>}
<Date 2018, 1, 24>>
<Employee "Ben Bitdiddle"
#{<Role Programmer>}
<Date 2019, 2, 13>> ]>>
```
The annotations aren't related to the data requested, which is all
about "employees"; instead, they're about the systems that produced
the response.
You could say they're in the domain of "debugging" instead of the
domain of "employees".
<a id="org1924a0a"></a>
# Conclusions
We've covered the broad strokes of Preserves, but not everything that
is possible with it.
We leave it as an exercise to the reader to try reading these examples
into their languages (several libraries exist already) and writing them
out as binary objects.
But as we've seen, Preserves is a flexible system which comes with
well-defined, carefully specified built-in types, as well as a
meta-type which can be used as an extension point.
Happy preserving!