Add Preserves tutorial

This commit is contained in:
Christopher Lemmer Webber 2019-08-15 19:01:43 -04:00 committed by Tony Garnock-Jones
parent 0365fd8c36
commit dca049ce46
1 changed files with 394 additions and 0 deletions

394
TUTORIAL.org Normal file
View File

@ -0,0 +1,394 @@
#+TITLE: Preserves: a tutorial
#+AUTHOR: Christopher Lemmer Webber
/This document, like Preserves itself, is released under/
/[[file:./LICENSE][version 2.0 of the Apache license]]./
* Overview
Preserves is a serialization system which supplies both a
human-readable textual and efficient binary syntax; converting between
the two is straightforward.
Preserves' human readable syntax is easy to read and should be mostly
familiar if you already know systems like JSON.
However, Preserves is less ambiguously specified than JSON, and also
has a clean extension mechanism.
This document is a tutorial; it does not get into all the details of
Preserves.
For that, see the [[file:./preserves.md][Preserves specification]].
* Preserves basics
** Starting with the familiar
If you're familiar with JSON, Preserves looks fairly similar:
#+BEGIN_SRC preserves
{"name": "Missy Rose",
"species": "Felis Catus",
"age": 13,
"foods": ["kibble", "cat treats", "tinned meat"]}
#+END_SRC
Preserves also has something we can use for debugging/development
information called "annotations"; they aren't actually read in as data
but we can use them for comments.
(They can also be used for other development tools and are not
restricted to strings; more on this later, but for now interpret them
as comments.)
#+BEGIN_SRC preserves
@"I'm an annotation... basically a comment. Ignore me!"
"I'm data! Don't ignore me!"
#+END_SRC
Preserves supports some data types you're probably already familiar
with from JSON, and which look fairly similar in the textual format:
#+BEGIN_SRC preserves
@"booleans"
#true
#false
@"various kinds of numbers:"
42
123556789012345678901234567890
-10
13.5
@"strings"
"I'm feeling stringy!"
@"sequences (lists)"
["cat", "dog", "mouse", "goldfish"]
@"dictionaries (hashmaps)"
{"cat": "meow",
"dog": "woof",
"goldfish": "glub glub",
"mouse": "squeak"}
#+END_SRC
** Going beyond JSON
We can observe a few differences from JSON already; it's possible to
express numbers of arbitrary length in Preserves, and booleans look a little
bit different.
A few more interesting differences:
#+BEGIN_SRC preserves
@"Preserves treats commas as whitespace, so these are the same"
["cat", "dog", "mouse", "goldfish"]
["cat" "dog" "mouse" "goldfish"]
@"We can use anything as keys in dictionaries, not just strings"
{1: "the loneliest number",
["why", "was", 6, "afraid", "of", 7]: "because 7 8 9",
{"dictionaries": "as keys???"}: "well, why not?"}
#+END_SRC
Preserves technically provides a few types of numbers:
#+BEGIN_SRC preserves
@"Signed Integers"
42
-42
5907212309572059846509324862304968273468909473609826340
-5907212309572059846509324862304968273468909473609826340
@"Floats (Single-precision IEEE floats) (notice the trailing f)"
3.1415927f
@"Doubles (Double-precision IEEE floats)"
3.141592653589793
#+END_SRC
Preserves also provides some types that don't come in JSON.
=Symbols= are fairly interesting; they look a lot like strings but
really aren't meant to represent text as much as they are, well... a
symbolic name.
Often they're meant to be used for something that has symbolic importance
to the program, but not textual importance (other than to guide the
programmer... not unlike variable names).
#+BEGIN_SRC preserves
@"A symbol (NOT a string!)"
JustASymbol
@"You can do mixedCase or CamelCase too of course, pick your poison"
@"(but be consistent, for the sake of your collaborators!"
iAmASymbol
i-am-a-symbol
@"A list of symbols"
[GET, PUT, POST, DELETE]
@"A symbol with a space in it"
|this is just one symbol believe it or not|
#+END_SRC
We can also add binary data, aka ByteStrings:
#+BEGIN_SRC preserves
@"Some binary data, base64 encoded"
#base64{cGljdHVyZSBvZiBhIGNhdA==}
@"Some other binary data, hexadecimal encoded"
#hex{616263}
@"Same binary data as above, base64 encoded"
#base64{YWJj}
#+END_SRC
What's neat about this is that we don't have to "pay the cost" of
base64 or hexadecimal encoding when we serialize this data to binary;
the length of the binary data is the length of the binary data.
Conveniently, Preserves also includes Sets, which are collections of
unique elements where ordering of items is unimportant.
#+BEGIN_SRC preserves
#set{flour, salt, water}
#+END_SRC
** Total ordering
This is a good time to mention that even though from a semantic
perspective sets and dictionaries do not carry information about the
ordering of their elements (and Preserves doesn't care what order we
enter them in for our hand-written-as-text Preserves documents), when
serializing to binary data (or even when the Preserves library itself
serializes to textual data), Preserves will always write out the
elements in the same order, every time (aka, "total ordering").
This is important and useful for many contexts, but especially for
cryptographic signatures.
#+BEGIN_SRC preserves
@"This hand-typed Preserves document..."
{monkey: {"noise": "ooh-ooh",
"eats": #set{"bananas", "berries"}}
cat: {"noise": "meow",
"eats": #set{"kibble", "cat treats", "tinned meat"}}}
@"Will always, always be written out in this order:"
{cat: {"eats": #set{"cat treats", "kibble", "tinned meat"},
"noise": "meow"}
monkey: {"eats": #set{"bananas", "berries"},
"noise": "ooh-ooh"}}
#+END_SRC
** Defining our own types using Records
Finally, there is one more type that Preserves provides... but in a
sense, it's a meta-type.
=Record= objects have a tag and a series of arguments (or "slots").
For example, we can make a =Date= record:
#+BEGIN_SRC preserves
<Date 2019 8 15>
#+END_SRC
In this example, the =Date= tag is a symbol; 2019, 8, and 15 are the
year, month, and date slots respectively.
Why do we care about this?
We could instead just decide to encode our date data in a string,
like "2019-08-15".
A document using such a date structure might look like so:
#+BEGIN_SRC preserves
{"name": "Gregor Samsa",
"description": "humanoid trapped in an insect body",
"born": "1915-10-04"}
#+END_SRC
Unfortunately, say our boss comes along and tells us that the people
doing data entry have complained that it isn't always possible to get
an exact date.
They would like to be able to type in what they know if they don't
know the date exactly.
This causes a problem.
Now we might have two kinds of entries:
#+BEGIN_SRC preserves
@"Exact date known"
{"name": "Gregor Samsa",
"description": "humanoid trapped in an insect body",
"born": "1915-10-04"}
@"Not sure about exact date..."
{"name": "Gregor Samsa",
"description": "humanoid trapped in an insect body",
"born": "Sometime in October 1915? Or was that when he became an insect?"}
#+END_SRC
This is a mess.
We /could/ just try parsing a regular expression to see if it "looks
like a date", but doing this kind of thing is prone to errors and weird
edge cases.
No, it's better to be able to have a separate type:
#+BEGIN_SRC preserves
@"Exact date known"
{"name": "Gregor Samsa",
"description": "humanoid trapped in an insect body",
"born": <Date 1915 10 04>}
@"Not sure about exact date..."
{"name": "Gregor Samsa",
"description": "humanoid trapped in an insect body",
"born": "Sometime in October 1915? Or was that when he became an insect?"}
#+END_SRC
Now we can distinguish the two.
We can make as many Record types as our program need, though it is up
to our program to make sense of what these mean.
Since Preserves does not specify the =Date= itself, both the program
(or person) writing the Preserves document and the program reading it
need to have a mutual understanding of how many slots it has and what
the meaning the tag signifies for it to be of use.
Still, there are plenty of interesting tags we can define.
Here is one for an "iri", a hyperlink:
#+BEGIN_SRC preserves
<iri "https://dustycloud.org/blog/">
#+END_SRC
That's nice enough, but here's another interesting detail... tags on
Records are usually strings but aren't necessarily so.
They can also be strings or numbers or even dictionaries.
And very interestingly, they can also be other records:
#+BEGIN_SRC preserves
<<iri "https://www.w3.org/ns/activitystreams#Note">
{"to": [<iri "https://chatty.example/ben/">],
"attributedTo": <iri "https://social.example/alyssa/">,
"content": "Say, did you finish reading that book I lent you?"}>
#+END_SRC
Do you see it? This object's type is... an iri Record!
The link here points to a more precise term saying that "this is a
note meant to be sent around in social networks".
It is considerably more precise than just using the string or symbol
"Note", which could be ambiguous.
(A social networking note? A footnote? A music note?
While not all systems need this, this (partial) example hints at how
Preserves can also be used to coordinate meaning in larger, more
decentralized systems.
Likewise, it is also possible to annotate records with integers.
A system could use this to reduce the redundancy cost of tags sent
over the wire by indexing tags and substituting them after reading
the structures.
#+BEGIN_SRC preserves
@"The ordered index of tags for this session"
[Employee, Role, Date]
@"We could then transform this structure..."
#set{<Employee @"employee name"
"Alyssa P. Hacker"
@"employee roles"
#set{<Role Programmer>,
<Role Manager>},
@"when hired"
<Date 2018, 1, 24>>,
<Employee @"employee name"
"Ben Bitdiddle"
@"employee roles"
#set{<Role Programmer>},
@"when hired"
<Date 2019, 2, 13>>}
@"... to this structure, which in binary is 91 as opposed to 127 bytes"
#set{<0 @"employee name"
"Alyssa P. Hacker"
@"employee roles"
#set{<1 Programmer>,
<1 Manager>},
@"when hired"
<2 2018, 1, 24>>,
<0 @"employee name"
"Ben Bitdiddle"
@"employee roles"
#set{<1 Programmer>},
@"when hired"
<2 2019, 2, 13>>}
#+END_SRC
Even in this trivial example, this is a 25% reduction in the binary size.
Even though tooling to do this does not come out of the box in Preserves,
the fact that Record tags can be anything makes it possible to build this
or any such appropriate structure.
** Annotations
Annotations are not strictly a necessary feature, but they are useful
in some circumstances.
We have previously shown them used as comments:
#+BEGIN_SRC preserves
@"I'm a comment!"
"I am not a comment, I am data!"
#+END_SRC
Annotations annotate the values the precede.
It is possible to have multiple annotations on a value.
#+BEGIN_SRC preserves
@"I am annotating this number"
@"And so am I!"
42
#+END_SRC
As said, annotations are not really data.
They are merely meant for development tooling or debugging.
You have to explicitly ask for them when reading, and they wrap all
the values.
So what's the point of them then?
If annotations were just for comments, there would be indeed hardly
point at all... it would be simpler to just provide a comment syntax.
However, annotations can be used for more than just comments.
They can also be used for debugging or other development-tool-oriented
data.
For instance, here is some data game data annotated with who the
"project owner" is of each object.
#+BEGIN_SRC preserves
<NpcCatalog
"Monsters"
#set{@<ProjectLead Alyssa>
{name: "Ogre",
spriteSheet: #base64{T2dyZSBzcHJpdGVzIGdvIGhlcmU=},
attributes: #set{biped, brute, rage, clumsy}},
@<ProjectLead Ben>
{name: "Jackal",
spriteSheet: #base64{V2l0Y2ggc3ByaXRlcyBnbyBoZXJl},
attributes: #set{quadruped, swift, pack-animal, weak}}}>
#+END_SRC
Each monster descrived in the set is annotated with a =ProjectLead=
record.
While useful information used by the game company's organization
system, it doesn't particularly matter when reading in the data
just as code.
* Conclusions
We've covered the broad strokes of Preserves, but not everything that
is possible with it.
We leave it as an exercise to the reader to try reading these examples
into their languages (several libraries exist already) and writing them
out as binary objects.
But as we've seen, Preserves is a flexible system which comes with
well-defined, carefully specified built-in types, as well as a
meta-type which can be used as an extension point.
Happy preserving!