Add Preserves tutorial
This commit is contained in:
parent
0365fd8c36
commit
dca049ce46
|
@ -0,0 +1,394 @@
|
|||
#+TITLE: Preserves: a tutorial
|
||||
#+AUTHOR: Christopher Lemmer Webber
|
||||
|
||||
/This document, like Preserves itself, is released under/
|
||||
/[[file:./LICENSE][version 2.0 of the Apache license]]./
|
||||
|
||||
* Overview
|
||||
|
||||
Preserves is a serialization system which supplies both a
|
||||
human-readable textual and efficient binary syntax; converting between
|
||||
the two is straightforward.
|
||||
Preserves' human readable syntax is easy to read and should be mostly
|
||||
familiar if you already know systems like JSON.
|
||||
However, Preserves is less ambiguously specified than JSON, and also
|
||||
has a clean extension mechanism.
|
||||
|
||||
This document is a tutorial; it does not get into all the details of
|
||||
Preserves.
|
||||
For that, see the [[file:./preserves.md][Preserves specification]].
|
||||
|
||||
* Preserves basics
|
||||
|
||||
** Starting with the familiar
|
||||
|
||||
If you're familiar with JSON, Preserves looks fairly similar:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
{"name": "Missy Rose",
|
||||
"species": "Felis Catus",
|
||||
"age": 13,
|
||||
"foods": ["kibble", "cat treats", "tinned meat"]}
|
||||
#+END_SRC
|
||||
|
||||
Preserves also has something we can use for debugging/development
|
||||
information called "annotations"; they aren't actually read in as data
|
||||
but we can use them for comments.
|
||||
(They can also be used for other development tools and are not
|
||||
restricted to strings; more on this later, but for now interpret them
|
||||
as comments.)
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"I'm an annotation... basically a comment. Ignore me!"
|
||||
"I'm data! Don't ignore me!"
|
||||
#+END_SRC
|
||||
|
||||
Preserves supports some data types you're probably already familiar
|
||||
with from JSON, and which look fairly similar in the textual format:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"booleans"
|
||||
#true
|
||||
#false
|
||||
|
||||
@"various kinds of numbers:"
|
||||
42
|
||||
123556789012345678901234567890
|
||||
-10
|
||||
13.5
|
||||
|
||||
@"strings"
|
||||
"I'm feeling stringy!"
|
||||
|
||||
@"sequences (lists)"
|
||||
["cat", "dog", "mouse", "goldfish"]
|
||||
|
||||
@"dictionaries (hashmaps)"
|
||||
{"cat": "meow",
|
||||
"dog": "woof",
|
||||
"goldfish": "glub glub",
|
||||
"mouse": "squeak"}
|
||||
#+END_SRC
|
||||
|
||||
** Going beyond JSON
|
||||
|
||||
We can observe a few differences from JSON already; it's possible to
|
||||
express numbers of arbitrary length in Preserves, and booleans look a little
|
||||
bit different.
|
||||
A few more interesting differences:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"Preserves treats commas as whitespace, so these are the same"
|
||||
["cat", "dog", "mouse", "goldfish"]
|
||||
["cat" "dog" "mouse" "goldfish"]
|
||||
|
||||
@"We can use anything as keys in dictionaries, not just strings"
|
||||
{1: "the loneliest number",
|
||||
["why", "was", 6, "afraid", "of", 7]: "because 7 8 9",
|
||||
{"dictionaries": "as keys???"}: "well, why not?"}
|
||||
#+END_SRC
|
||||
|
||||
Preserves technically provides a few types of numbers:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"Signed Integers"
|
||||
42
|
||||
-42
|
||||
5907212309572059846509324862304968273468909473609826340
|
||||
-5907212309572059846509324862304968273468909473609826340
|
||||
|
||||
@"Floats (Single-precision IEEE floats) (notice the trailing f)"
|
||||
3.1415927f
|
||||
|
||||
@"Doubles (Double-precision IEEE floats)"
|
||||
3.141592653589793
|
||||
#+END_SRC
|
||||
|
||||
Preserves also provides some types that don't come in JSON.
|
||||
=Symbols= are fairly interesting; they look a lot like strings but
|
||||
really aren't meant to represent text as much as they are, well... a
|
||||
symbolic name.
|
||||
Often they're meant to be used for something that has symbolic importance
|
||||
to the program, but not textual importance (other than to guide the
|
||||
programmer... not unlike variable names).
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"A symbol (NOT a string!)"
|
||||
JustASymbol
|
||||
|
||||
@"You can do mixedCase or CamelCase too of course, pick your poison"
|
||||
@"(but be consistent, for the sake of your collaborators!"
|
||||
iAmASymbol
|
||||
i-am-a-symbol
|
||||
|
||||
@"A list of symbols"
|
||||
[GET, PUT, POST, DELETE]
|
||||
|
||||
@"A symbol with a space in it"
|
||||
|this is just one symbol believe it or not|
|
||||
#+END_SRC
|
||||
|
||||
We can also add binary data, aka ByteStrings:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"Some binary data, base64 encoded"
|
||||
#base64{cGljdHVyZSBvZiBhIGNhdA==}
|
||||
|
||||
@"Some other binary data, hexadecimal encoded"
|
||||
#hex{616263}
|
||||
|
||||
@"Same binary data as above, base64 encoded"
|
||||
#base64{YWJj}
|
||||
#+END_SRC
|
||||
|
||||
What's neat about this is that we don't have to "pay the cost" of
|
||||
base64 or hexadecimal encoding when we serialize this data to binary;
|
||||
the length of the binary data is the length of the binary data.
|
||||
|
||||
Conveniently, Preserves also includes Sets, which are collections of
|
||||
unique elements where ordering of items is unimportant.
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
#set{flour, salt, water}
|
||||
#+END_SRC
|
||||
|
||||
** Total ordering
|
||||
|
||||
This is a good time to mention that even though from a semantic
|
||||
perspective sets and dictionaries do not carry information about the
|
||||
ordering of their elements (and Preserves doesn't care what order we
|
||||
enter them in for our hand-written-as-text Preserves documents), when
|
||||
serializing to binary data (or even when the Preserves library itself
|
||||
serializes to textual data), Preserves will always write out the
|
||||
elements in the same order, every time (aka, "total ordering").
|
||||
This is important and useful for many contexts, but especially for
|
||||
cryptographic signatures.
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"This hand-typed Preserves document..."
|
||||
{monkey: {"noise": "ooh-ooh",
|
||||
"eats": #set{"bananas", "berries"}}
|
||||
cat: {"noise": "meow",
|
||||
"eats": #set{"kibble", "cat treats", "tinned meat"}}}
|
||||
|
||||
@"Will always, always be written out in this order:"
|
||||
{cat: {"eats": #set{"cat treats", "kibble", "tinned meat"},
|
||||
"noise": "meow"}
|
||||
monkey: {"eats": #set{"bananas", "berries"},
|
||||
"noise": "ooh-ooh"}}
|
||||
#+END_SRC
|
||||
|
||||
** Defining our own types using Records
|
||||
|
||||
Finally, there is one more type that Preserves provides... but in a
|
||||
sense, it's a meta-type.
|
||||
=Record= objects have a tag and a series of arguments (or "slots").
|
||||
For example, we can make a =Date= record:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
<Date 2019 8 15>
|
||||
#+END_SRC
|
||||
|
||||
In this example, the =Date= tag is a symbol; 2019, 8, and 15 are the
|
||||
year, month, and date slots respectively.
|
||||
|
||||
Why do we care about this?
|
||||
We could instead just decide to encode our date data in a string,
|
||||
like "2019-08-15".
|
||||
A document using such a date structure might look like so:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
{"name": "Gregor Samsa",
|
||||
"description": "humanoid trapped in an insect body",
|
||||
"born": "1915-10-04"}
|
||||
#+END_SRC
|
||||
|
||||
Unfortunately, say our boss comes along and tells us that the people
|
||||
doing data entry have complained that it isn't always possible to get
|
||||
an exact date.
|
||||
They would like to be able to type in what they know if they don't
|
||||
know the date exactly.
|
||||
|
||||
This causes a problem.
|
||||
Now we might have two kinds of entries:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"Exact date known"
|
||||
{"name": "Gregor Samsa",
|
||||
"description": "humanoid trapped in an insect body",
|
||||
"born": "1915-10-04"}
|
||||
|
||||
@"Not sure about exact date..."
|
||||
{"name": "Gregor Samsa",
|
||||
"description": "humanoid trapped in an insect body",
|
||||
"born": "Sometime in October 1915? Or was that when he became an insect?"}
|
||||
#+END_SRC
|
||||
|
||||
This is a mess.
|
||||
We /could/ just try parsing a regular expression to see if it "looks
|
||||
like a date", but doing this kind of thing is prone to errors and weird
|
||||
edge cases.
|
||||
No, it's better to be able to have a separate type:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"Exact date known"
|
||||
{"name": "Gregor Samsa",
|
||||
"description": "humanoid trapped in an insect body",
|
||||
"born": <Date 1915 10 04>}
|
||||
|
||||
@"Not sure about exact date..."
|
||||
{"name": "Gregor Samsa",
|
||||
"description": "humanoid trapped in an insect body",
|
||||
"born": "Sometime in October 1915? Or was that when he became an insect?"}
|
||||
#+END_SRC
|
||||
|
||||
Now we can distinguish the two.
|
||||
|
||||
We can make as many Record types as our program need, though it is up
|
||||
to our program to make sense of what these mean.
|
||||
Since Preserves does not specify the =Date= itself, both the program
|
||||
(or person) writing the Preserves document and the program reading it
|
||||
need to have a mutual understanding of how many slots it has and what
|
||||
the meaning the tag signifies for it to be of use.
|
||||
|
||||
Still, there are plenty of interesting tags we can define.
|
||||
Here is one for an "iri", a hyperlink:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
<iri "https://dustycloud.org/blog/">
|
||||
#+END_SRC
|
||||
|
||||
That's nice enough, but here's another interesting detail... tags on
|
||||
Records are usually strings but aren't necessarily so.
|
||||
They can also be strings or numbers or even dictionaries.
|
||||
And very interestingly, they can also be other records:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
<<iri "https://www.w3.org/ns/activitystreams#Note">
|
||||
{"to": [<iri "https://chatty.example/ben/">],
|
||||
"attributedTo": <iri "https://social.example/alyssa/">,
|
||||
"content": "Say, did you finish reading that book I lent you?"}>
|
||||
#+END_SRC
|
||||
|
||||
Do you see it? This object's type is... an iri Record!
|
||||
The link here points to a more precise term saying that "this is a
|
||||
note meant to be sent around in social networks".
|
||||
It is considerably more precise than just using the string or symbol
|
||||
"Note", which could be ambiguous.
|
||||
(A social networking note? A footnote? A music note?
|
||||
While not all systems need this, this (partial) example hints at how
|
||||
Preserves can also be used to coordinate meaning in larger, more
|
||||
decentralized systems.
|
||||
|
||||
Likewise, it is also possible to annotate records with integers.
|
||||
A system could use this to reduce the redundancy cost of tags sent
|
||||
over the wire by indexing tags and substituting them after reading
|
||||
the structures.
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"The ordered index of tags for this session"
|
||||
[Employee, Role, Date]
|
||||
|
||||
@"We could then transform this structure..."
|
||||
#set{<Employee @"employee name"
|
||||
"Alyssa P. Hacker"
|
||||
@"employee roles"
|
||||
#set{<Role Programmer>,
|
||||
<Role Manager>},
|
||||
@"when hired"
|
||||
<Date 2018, 1, 24>>,
|
||||
<Employee @"employee name"
|
||||
"Ben Bitdiddle"
|
||||
@"employee roles"
|
||||
#set{<Role Programmer>},
|
||||
@"when hired"
|
||||
<Date 2019, 2, 13>>}
|
||||
|
||||
@"... to this structure, which in binary is 91 as opposed to 127 bytes"
|
||||
#set{<0 @"employee name"
|
||||
"Alyssa P. Hacker"
|
||||
@"employee roles"
|
||||
#set{<1 Programmer>,
|
||||
<1 Manager>},
|
||||
@"when hired"
|
||||
<2 2018, 1, 24>>,
|
||||
<0 @"employee name"
|
||||
"Ben Bitdiddle"
|
||||
@"employee roles"
|
||||
#set{<1 Programmer>},
|
||||
@"when hired"
|
||||
<2 2019, 2, 13>>}
|
||||
#+END_SRC
|
||||
|
||||
Even in this trivial example, this is a 25% reduction in the binary size.
|
||||
Even though tooling to do this does not come out of the box in Preserves,
|
||||
the fact that Record tags can be anything makes it possible to build this
|
||||
or any such appropriate structure.
|
||||
|
||||
** Annotations
|
||||
|
||||
Annotations are not strictly a necessary feature, but they are useful
|
||||
in some circumstances.
|
||||
We have previously shown them used as comments:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"I'm a comment!"
|
||||
"I am not a comment, I am data!"
|
||||
#+END_SRC
|
||||
|
||||
Annotations annotate the values the precede.
|
||||
It is possible to have multiple annotations on a value.
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"I am annotating this number"
|
||||
@"And so am I!"
|
||||
42
|
||||
#+END_SRC
|
||||
|
||||
As said, annotations are not really data.
|
||||
They are merely meant for development tooling or debugging.
|
||||
You have to explicitly ask for them when reading, and they wrap all
|
||||
the values.
|
||||
|
||||
So what's the point of them then?
|
||||
If annotations were just for comments, there would be indeed hardly
|
||||
point at all... it would be simpler to just provide a comment syntax.
|
||||
|
||||
However, annotations can be used for more than just comments.
|
||||
They can also be used for debugging or other development-tool-oriented
|
||||
data.
|
||||
For instance, here is some data game data annotated with who the
|
||||
"project owner" is of each object.
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
<NpcCatalog
|
||||
"Monsters"
|
||||
#set{@<ProjectLead Alyssa>
|
||||
{name: "Ogre",
|
||||
spriteSheet: #base64{T2dyZSBzcHJpdGVzIGdvIGhlcmU=},
|
||||
attributes: #set{biped, brute, rage, clumsy}},
|
||||
@<ProjectLead Ben>
|
||||
{name: "Jackal",
|
||||
spriteSheet: #base64{V2l0Y2ggc3ByaXRlcyBnbyBoZXJl},
|
||||
attributes: #set{quadruped, swift, pack-animal, weak}}}>
|
||||
#+END_SRC
|
||||
|
||||
Each monster descrived in the set is annotated with a =ProjectLead=
|
||||
record.
|
||||
While useful information used by the game company's organization
|
||||
system, it doesn't particularly matter when reading in the data
|
||||
just as code.
|
||||
|
||||
* Conclusions
|
||||
|
||||
We've covered the broad strokes of Preserves, but not everything that
|
||||
is possible with it.
|
||||
We leave it as an exercise to the reader to try reading these examples
|
||||
into their languages (several libraries exist already) and writing them
|
||||
out as binary objects.
|
||||
|
||||
But as we've seen, Preserves is a flexible system which comes with
|
||||
well-defined, carefully specified built-in types, as well as a
|
||||
meta-type which can be used as an extension point.
|
||||
|
||||
Happy preserving!
|
Loading…
Reference in New Issue