forked from syndicate-lang/preserves
Move from TUTORIAL.org -> TUTORIAL.md
This commit is contained in:
parent
40d474d456
commit
1a84c3f609
|
@ -0,0 +1,393 @@
|
|||
# Preserves: a tutorial
|
||||
|
||||
By Christopher Lemmer Webber and Tony Garnock-Jones
|
||||
|
||||
*This document, like Preserves itself, is released under*
|
||||
*[version 2.0 of the Apache license](./LICENSE).*
|
||||
|
||||
|
||||
<a id="org74bd8ba"></a>
|
||||
|
||||
# Overview
|
||||
|
||||
Preserves is a serialization system which supplies both a
|
||||
human-readable textual and efficient binary syntax; converting between
|
||||
the two is straightforward.
|
||||
Preserves' human readable syntax is easy to read and should be mostly
|
||||
familiar if you already know systems like JSON.
|
||||
However, Preserves is more precisely specified than JSON, and also
|
||||
has a clean extension mechanism.
|
||||
|
||||
This document is a tutorial; it does not get into all the details of
|
||||
Preserves.
|
||||
For that, see the [Preserves specification](./preserves.md).
|
||||
|
||||
|
||||
<a id="org0a73f91"></a>
|
||||
|
||||
# Preserves basics
|
||||
|
||||
|
||||
<a id="orge8d9054"></a>
|
||||
|
||||
## Starting with the familiar
|
||||
|
||||
If you're familiar with JSON, Preserves looks fairly similar:
|
||||
|
||||
``` javascript
|
||||
{"name": "Missy Rose",
|
||||
"species": "Felis Catus",
|
||||
"age": 13,
|
||||
"foods": ["kibble", "cat treats", "tinned meat"]}
|
||||
```
|
||||
|
||||
Preserves also has something we can use for debugging/development
|
||||
information called "annotations"; they aren't actually read in as data
|
||||
but we can use them for comments.
|
||||
(They can also be used for other development tools and are not
|
||||
restricted to strings; more on this later, but for now interpret them
|
||||
as comments.)
|
||||
|
||||
``` javascript
|
||||
@"I'm an annotation... basically a comment. Ignore me!"
|
||||
"I'm data! Don't ignore me!"
|
||||
```
|
||||
|
||||
Preserves supports some data types you're probably already familiar
|
||||
with from JSON, and which look fairly similar in the textual format:
|
||||
|
||||
``` javascript
|
||||
@"booleans"
|
||||
#true
|
||||
#false
|
||||
|
||||
@"various kinds of numbers:"
|
||||
42
|
||||
123556789012345678901234567890
|
||||
-10
|
||||
13.5
|
||||
|
||||
@"strings"
|
||||
"I'm feeling stringy!"
|
||||
|
||||
@"sequences (lists)"
|
||||
["cat", "dog", "mouse", "goldfish"]
|
||||
|
||||
@"dictionaries (hashmaps)"
|
||||
{"cat": "meow",
|
||||
"dog": "woof",
|
||||
"goldfish": "glub glub",
|
||||
"mouse": "squeak"}
|
||||
```
|
||||
|
||||
|
||||
<a id="org270f7f4"></a>
|
||||
|
||||
## Going beyond JSON
|
||||
|
||||
We can observe a few differences from JSON already; it's possible to
|
||||
express numbers of arbitrary length in Preserves, and booleans look a little
|
||||
bit different.
|
||||
A few more interesting differences:
|
||||
|
||||
``` javascript
|
||||
@"Preserves treats commas as whitespace, so these are the same"
|
||||
["cat", "dog", "mouse", "goldfish"]
|
||||
["cat" "dog" "mouse" "goldfish"]
|
||||
|
||||
@"We can use anything as keys in dictionaries, not just strings"
|
||||
{1: "the loneliest number",
|
||||
["why", "was", 6, "afraid", "of", 7]: "because 7 8 9",
|
||||
{"dictionaries": "as keys???"}: "well, why not?"}
|
||||
```
|
||||
|
||||
Preserves technically provides a few types of numbers:
|
||||
|
||||
``` javascript
|
||||
@"Signed Integers"
|
||||
42
|
||||
-42
|
||||
5907212309572059846509324862304968273468909473609826340
|
||||
-5907212309572059846509324862304968273468909473609826340
|
||||
|
||||
@"Floats (Single-precision IEEE floats) (notice the trailing f)"
|
||||
3.1415927f
|
||||
|
||||
@"Doubles (Double-precision IEEE floats)"
|
||||
3.141592653589793
|
||||
```
|
||||
|
||||
Preserves also provides some types that don't come in JSON.
|
||||
`Symbols` are fairly interesting; they look a lot like strings but
|
||||
really aren't meant to represent text as much as they are, well… a
|
||||
symbolic name.
|
||||
Often they're meant to be used for something that has symbolic importance
|
||||
to the program, but not textual importance (other than to guide the
|
||||
programmer… not unlike variable names).
|
||||
|
||||
``` javascript
|
||||
@"A symbol (NOT a string!)"
|
||||
JustASymbol
|
||||
|
||||
@"You can do mixedCase or CamelCase too of course, pick your poison"
|
||||
@"(but be consistent, for the sake of your collaborators!"
|
||||
iAmASymbol
|
||||
i-am-a-symbol
|
||||
|
||||
@"A list of symbols"
|
||||
[GET, PUT, POST, DELETE]
|
||||
|
||||
@"A symbol with spaces in it"
|
||||
|this is just one symbol believe it or not|
|
||||
```
|
||||
|
||||
We can also add binary data, aka ByteStrings:
|
||||
|
||||
``` javascript
|
||||
@"Some binary data, base64 encoded"
|
||||
#base64{cGljdHVyZSBvZiBhIGNhdA==}
|
||||
|
||||
@"Some other binary data, hexadecimal encoded"
|
||||
#hex{616263}
|
||||
|
||||
@"Same binary data as above, base64 encoded"
|
||||
#base64{YWJj}
|
||||
```
|
||||
|
||||
What's neat about this is that we don't have to "pay the cost" of
|
||||
base64 or hexadecimal encoding when we serialize this data to binary;
|
||||
the length of the binary data is the length of the binary data.
|
||||
|
||||
Conveniently, Preserves also includes Sets, which are collections of
|
||||
unique elements where ordering of items is unimportant.
|
||||
|
||||
``` javascript
|
||||
#set{flour, salt, water}
|
||||
```
|
||||
|
||||
<a id="orgefafe56"></a>
|
||||
|
||||
## Total ordering and canonicalization
|
||||
|
||||
This is a good time to mention that even though from a semantic
|
||||
perspective sets and dictionaries do not carry information about the
|
||||
ordering of their elements (and Preserves doesn't care what order we
|
||||
enter them in for our hand-written-as-text Preserves documents),
|
||||
Preserves has a well-defined "total ordering".
|
||||
|
||||
**FULL WARNING:** the following claim is not implemented yet. :)
|
||||
Coming soon!
|
||||
|
||||
Based on this total ordering, Preserves provides support for canonical
|
||||
ordering when serializing; in this mode, Preserves will always write
|
||||
out the elements in the same order, every time.
|
||||
When combined with binary serialization, this is Preserves' "canonical
|
||||
form".
|
||||
This is important and useful for many contexts, but especially for
|
||||
cryptographic signatures and hashing.
|
||||
|
||||
``` javascript
|
||||
@"This hand-typed Preserves document..."
|
||||
{monkey: {"noise": "ooh-ooh",
|
||||
"eats": #set{"bananas", "berries"}}
|
||||
cat: {"noise": "meow",
|
||||
"eats": #set{"kibble", "cat treats", "tinned meat"}}}
|
||||
|
||||
@"Will always, always be written out in this order when canonicalized:"
|
||||
{cat: {"eats": #set{"cat treats", "kibble", "tinned meat"},
|
||||
"noise": "meow"}
|
||||
monkey: {"eats": #set{"bananas", "berries"},
|
||||
"noise": "ooh-ooh"}}
|
||||
```
|
||||
|
||||
Clever implementations can get canonicalized output for free by
|
||||
carefully ordering set elements and dictionary entries at construction
|
||||
time, but even in simple implementations, canonical serialization is
|
||||
almost as cheap as normal serialization.
|
||||
|
||||
|
||||
<a id="org0366627"></a>
|
||||
|
||||
## Defining our own types using Records
|
||||
|
||||
Finally, there is one more type that Preserves provides… but in a
|
||||
sense, it's a meta-type.
|
||||
`Record` objects have a label and a series of arguments (or "fields").
|
||||
For example, we can make a `Date` record:
|
||||
|
||||
``` javascript
|
||||
<Date 2019 8 15>
|
||||
```
|
||||
|
||||
In this example, the `Date` label is a symbol; 2019, 8, and 15 are the
|
||||
year, month, and day fields respectively.
|
||||
|
||||
Why do we care about this?
|
||||
We could instead just decide to encode our date data in a string,
|
||||
like "2019-08-15".
|
||||
A document using such a date structure might look like so:
|
||||
|
||||
``` javascript
|
||||
{"name": "Gregor Samsa",
|
||||
"description": "humanoid trapped in an insect body",
|
||||
"born": "1915-10-04"}
|
||||
```
|
||||
|
||||
Unfortunately, say our boss comes along and tells us that the people
|
||||
doing data entry have complained that it isn't always possible to get
|
||||
an exact date.
|
||||
They would like to be able to type in what they know if they don't
|
||||
know the date exactly.
|
||||
|
||||
This causes a problem.
|
||||
Now we might have two kinds of entries:
|
||||
|
||||
``` javascript
|
||||
@"Exact date known"
|
||||
{"name": "Gregor Samsa",
|
||||
"description": "humanoid trapped in an insect body",
|
||||
"born": "1915-10-04"}
|
||||
|
||||
@"Not sure about exact date..."
|
||||
{"name": "Gregor Samsa",
|
||||
"description": "humanoid trapped in an insect body",
|
||||
"born": "Sometime in October 1915? Or was that when he became an insect?"}
|
||||
```
|
||||
|
||||
This is a mess.
|
||||
We *could* just try parsing a regular expression to see if it "looks
|
||||
like a date", but doing this kind of thing is prone to errors and weird
|
||||
edge cases.
|
||||
No, it's better to be able to have a separate type:
|
||||
|
||||
``` javascript
|
||||
@"Exact date known"
|
||||
{"name": "Gregor Samsa",
|
||||
"description": "humanoid trapped in an insect body",
|
||||
"born": <Date 1915 10 04>}
|
||||
|
||||
@"Not sure about exact date..."
|
||||
{"name": "Gregor Samsa",
|
||||
"description": "humanoid trapped in an insect body",
|
||||
"born": <Unknown "Sometime in October 1915? Or was that when he became an insect?">}
|
||||
```
|
||||
|
||||
Now we can distinguish the two.
|
||||
|
||||
We can make as many Record types as our program needs, though it is up
|
||||
to our program to make sense of what these mean.
|
||||
Since Preserves does not specify the `Date` itself, both the program
|
||||
(or person) writing the Preserves document and the program reading it
|
||||
need to have a mutual understanding of how many fields it has and what
|
||||
the meaning the label signifies for it to be of use.
|
||||
|
||||
Still, there are plenty of interesting labels we can define.
|
||||
Here is one for an "iri", a hyperlink:
|
||||
|
||||
``` javascript
|
||||
<iri "https://dustycloud.org/blog/">
|
||||
```
|
||||
|
||||
That's nice enough, but here's another interesting detail… labels on
|
||||
Records are usually symbols but aren't necessarily so.
|
||||
They can also be strings or numbers or even dictionaries.
|
||||
And very interestingly, they can also be other records:
|
||||
|
||||
``` javascript
|
||||
<<iri "https://www.w3.org/ns/activitystreams#Note">
|
||||
{"to": [<iri "https://chatty.example/ben/">],
|
||||
"attributedTo": <iri "https://social.example/alyssa/">,
|
||||
"content": "Say, did you finish reading that book I lent you?"}>
|
||||
```
|
||||
|
||||
Do you see it? This Record's label is… an `iri` Record!
|
||||
The link here points to a more precise term saying that "this is a
|
||||
note meant to be sent around in social networks".
|
||||
It is considerably more precise than just using the string or symbol
|
||||
"Note", which could be ambiguous.
|
||||
(A social networking note? A footnote? A music note?)
|
||||
While not all systems need this, this (partial) example hints at how
|
||||
Preserves can also be used to coordinate meaning in larger, more
|
||||
decentralized systems.
|
||||
|
||||
Likewise, it is also possible to annotate records with integers.
|
||||
Languages like OCaml use integers instead of symbolic record labels
|
||||
because their type systems ensure that it is never ambiguous what,
|
||||
say, the label `23` means in any given context.
|
||||
Allowing integer record labels lets Preserves directly express OCaml
|
||||
data.
|
||||
|
||||
|
||||
<a id="org1b72b96"></a>
|
||||
|
||||
## Annotations
|
||||
|
||||
Annotations are not strictly a necessary feature, but they are useful
|
||||
in some circumstances.
|
||||
We have previously shown them used as comments:
|
||||
|
||||
``` javascript
|
||||
@"I'm a comment!"
|
||||
"I am not a comment, I am data!"
|
||||
```
|
||||
|
||||
Annotations annotate the values they precede.
|
||||
It is possible to have multiple annotations on a value.
|
||||
|
||||
``` javascript
|
||||
@"I am annotating this number"
|
||||
@"And so am I!"
|
||||
42
|
||||
```
|
||||
|
||||
As said, annotations are not really data.
|
||||
They are merely meant for development tooling or debugging.
|
||||
You have to explicitly ask for them when reading, and they wrap all
|
||||
the values.
|
||||
Many implementations will, in the same mode, also supply line number
|
||||
and column information attached to each read value.
|
||||
|
||||
So what's the point of them then?
|
||||
If annotations were just for comments, there would be indeed hardly
|
||||
point at all… it would be simpler to just provide a comment syntax.
|
||||
|
||||
However, annotations can be used for more than just comments.
|
||||
They can also be used for debugging or other development-tool-oriented
|
||||
data.
|
||||
|
||||
For instance, here's a reply from an HTTP API service running in
|
||||
"debug" mode annotated with the time it took to produce the reply and
|
||||
the internal name of the server that produced the response:
|
||||
|
||||
``` javascript
|
||||
@<ResponseTime <Milliseconds 64.4>>
|
||||
@<BackendServer "humpty-dumpty.example.com">
|
||||
<Success
|
||||
<Employees [
|
||||
<Employee "Alyssa P. Hacker" #set{<Role Programmer>, <Role Manager>}, <Date 2018, 1, 24>>
|
||||
<Employee "Ben Bitdiddle" #set{<Role Programmer>}, <Date 2019, 2, 13>> ]>>
|
||||
```
|
||||
|
||||
The annotations aren't related to the data requested, which is all
|
||||
about "employees"; instead, they're about the systems that produced
|
||||
the response.
|
||||
You could say they're in the domain of "debugging" instead of the
|
||||
domain of "employees".
|
||||
|
||||
|
||||
<a id="org1924a0a"></a>
|
||||
|
||||
# Conclusions
|
||||
|
||||
We've covered the broad strokes of Preserves, but not everything that
|
||||
is possible with it.
|
||||
We leave it as an exercise to the reader to try reading these examples
|
||||
into their languages (several libraries exist already) and writing them
|
||||
out as binary objects.
|
||||
|
||||
But as we've seen, Preserves is a flexible system which comes with
|
||||
well-defined, carefully specified built-in types, as well as a
|
||||
meta-type which can be used as an extension point.
|
||||
|
||||
Happy preserving!
|
||||
|
461
TUTORIAL.org
461
TUTORIAL.org
|
@ -1,461 +0,0 @@
|
|||
#+TITLE: Preserves: a tutorial
|
||||
#+AUTHOR: Christopher Lemmer Webber
|
||||
|
||||
/This document, like Preserves itself, is released under/
|
||||
/[[file:./LICENSE][version 2.0 of the Apache license]]./
|
||||
|
||||
* Overview
|
||||
|
||||
Preserves is a serialization system which supplies both a
|
||||
human-readable textual and efficient binary syntax; converting between
|
||||
the two is straightforward.
|
||||
Preserves' human readable syntax is easy to read and should be mostly
|
||||
familiar if you already know systems like JSON.
|
||||
However, Preserves is more precisely specified than JSON, and also
|
||||
has a clean extension mechanism.
|
||||
|
||||
This document is a tutorial; it does not get into all the details of
|
||||
Preserves.
|
||||
For that, see the [[file:./preserves.md][Preserves specification]].
|
||||
|
||||
* Preserves basics
|
||||
|
||||
** Starting with the familiar
|
||||
|
||||
If you're familiar with JSON, Preserves looks fairly similar:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
{"name": "Missy Rose",
|
||||
"species": "Felis Catus",
|
||||
"age": 13,
|
||||
"foods": ["kibble", "cat treats", "tinned meat"]}
|
||||
#+END_SRC
|
||||
|
||||
Preserves also has something we can use for debugging/development
|
||||
information called "annotations"; they aren't actually read in as data
|
||||
but we can use them for comments.
|
||||
(They can also be used for other development tools and are not
|
||||
restricted to strings; more on this later, but for now interpret them
|
||||
as comments.)
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"I'm an annotation... basically a comment. Ignore me!"
|
||||
"I'm data! Don't ignore me!"
|
||||
#+END_SRC
|
||||
|
||||
Preserves supports some data types you're probably already familiar
|
||||
with from JSON, and which look fairly similar in the textual format:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"booleans"
|
||||
#true
|
||||
#false
|
||||
|
||||
@"various kinds of numbers:"
|
||||
42
|
||||
123556789012345678901234567890
|
||||
-10
|
||||
13.5
|
||||
|
||||
@"strings"
|
||||
"I'm feeling stringy!"
|
||||
|
||||
@"sequences (lists)"
|
||||
["cat", "dog", "mouse", "goldfish"]
|
||||
|
||||
@"dictionaries (hashmaps)"
|
||||
{"cat": "meow",
|
||||
"dog": "woof",
|
||||
"goldfish": "glub glub",
|
||||
"mouse": "squeak"}
|
||||
#+END_SRC
|
||||
|
||||
** Going beyond JSON
|
||||
|
||||
We can observe a few differences from JSON already; it's possible to
|
||||
express numbers of arbitrary length in Preserves, and booleans look a little
|
||||
bit different.
|
||||
A few more interesting differences:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"Preserves treats commas as whitespace, so these are the same"
|
||||
["cat", "dog", "mouse", "goldfish"]
|
||||
["cat" "dog" "mouse" "goldfish"]
|
||||
|
||||
@"We can use anything as keys in dictionaries, not just strings"
|
||||
{1: "the loneliest number",
|
||||
["why", "was", 6, "afraid", "of", 7]: "because 7 8 9",
|
||||
{"dictionaries": "as keys???"}: "well, why not?"}
|
||||
#+END_SRC
|
||||
|
||||
Preserves technically provides a few types of numbers:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"Signed Integers"
|
||||
42
|
||||
-42
|
||||
5907212309572059846509324862304968273468909473609826340
|
||||
-5907212309572059846509324862304968273468909473609826340
|
||||
|
||||
@"Floats (Single-precision IEEE floats) (notice the trailing f)"
|
||||
3.1415927f
|
||||
|
||||
@"Doubles (Double-precision IEEE floats)"
|
||||
3.141592653589793
|
||||
#+END_SRC
|
||||
|
||||
Preserves also provides some types that don't come in JSON.
|
||||
=Symbols= are fairly interesting; they look a lot like strings but
|
||||
really aren't meant to represent text as much as they are, well... a
|
||||
symbolic name.
|
||||
Often they're meant to be used for something that has symbolic importance
|
||||
to the program, but not textual importance (other than to guide the
|
||||
programmer... not unlike variable names).
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"A symbol (NOT a string!)"
|
||||
JustASymbol
|
||||
|
||||
@"You can do mixedCase or CamelCase too of course, pick your poison"
|
||||
@"(but be consistent, for the sake of your collaborators!"
|
||||
iAmASymbol
|
||||
i-am-a-symbol
|
||||
|
||||
@"A list of symbols"
|
||||
[GET, PUT, POST, DELETE]
|
||||
|
||||
@"A symbol with spaces in it"
|
||||
|this is just one symbol believe it or not|
|
||||
#+END_SRC
|
||||
|
||||
We can also add binary data, aka ByteStrings:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"Some binary data, base64 encoded"
|
||||
#base64{cGljdHVyZSBvZiBhIGNhdA==}
|
||||
|
||||
@"Some other binary data, hexadecimal encoded"
|
||||
#hex{616263}
|
||||
|
||||
@"Same binary data as above, base64 encoded"
|
||||
#base64{YWJj}
|
||||
#+END_SRC
|
||||
|
||||
What's neat about this is that we don't have to "pay the cost" of
|
||||
base64 or hexadecimal encoding when we serialize this data to binary;
|
||||
the length of the binary data is the length of the binary data.
|
||||
|
||||
Conveniently, Preserves also includes Sets, which are collections of
|
||||
unique elements where ordering of items is unimportant.
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
#set{flour, salt, water}
|
||||
#+END_SRC
|
||||
|
||||
** Total ordering and canonicalization
|
||||
|
||||
This is a good time to mention that even though from a semantic
|
||||
perspective sets and dictionaries do not carry information about the
|
||||
ordering of their elements (and Preserves doesn't care what order we
|
||||
enter them in for our hand-written-as-text Preserves documents),
|
||||
Preserves has a well-defined "total ordering".
|
||||
|
||||
*FULL WARNING:* the following claim is not implemented yet. :)
|
||||
Coming soon!
|
||||
|
||||
Based on this total ordering, Preserves provides support for canonical
|
||||
ordering when serializing; in this mode, Preserves will always write
|
||||
out the elements in the same order, every time.
|
||||
When combined with binary serialization, this is Preserves' "canonical
|
||||
form".
|
||||
This is important and useful for many contexts, but especially for
|
||||
cryptographic signatures and hashing.
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"This hand-typed Preserves document..."
|
||||
{monkey: {"noise": "ooh-ooh",
|
||||
"eats": #set{"bananas", "berries"}}
|
||||
cat: {"noise": "meow",
|
||||
"eats": #set{"kibble", "cat treats", "tinned meat"}}}
|
||||
|
||||
@"Will always, always be written out in this order when canonicalized:"
|
||||
{cat: {"eats": #set{"cat treats", "kibble", "tinned meat"},
|
||||
"noise": "meow"}
|
||||
monkey: {"eats": #set{"bananas", "berries"},
|
||||
"noise": "ooh-ooh"}}
|
||||
#+END_SRC
|
||||
|
||||
Clever implementations can get canonicalized output for free by
|
||||
carefully ordering set elements and dictionary entries at construction
|
||||
time, but even in simple implementations, canonical serialization is
|
||||
almost as cheap as normal serialization.
|
||||
|
||||
** Defining our own types using Records
|
||||
|
||||
Finally, there is one more type that Preserves provides... but in a
|
||||
sense, it's a meta-type.
|
||||
=Record= objects have a label and a series of arguments (or "fields").
|
||||
For example, we can make a =Date= record:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
<Date 2019 8 15>
|
||||
#+END_SRC
|
||||
|
||||
In this example, the =Date= label is a symbol; 2019, 8, and 15 are the
|
||||
year, month, and day fields respectively.
|
||||
|
||||
Why do we care about this?
|
||||
We could instead just decide to encode our date data in a string,
|
||||
like "2019-08-15".
|
||||
A document using such a date structure might look like so:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
{"name": "Gregor Samsa",
|
||||
"description": "humanoid trapped in an insect body",
|
||||
"born": "1915-10-04"}
|
||||
#+END_SRC
|
||||
|
||||
Unfortunately, say our boss comes along and tells us that the people
|
||||
doing data entry have complained that it isn't always possible to get
|
||||
an exact date.
|
||||
They would like to be able to type in what they know if they don't
|
||||
know the date exactly.
|
||||
|
||||
This causes a problem.
|
||||
Now we might have two kinds of entries:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"Exact date known"
|
||||
{"name": "Gregor Samsa",
|
||||
"description": "humanoid trapped in an insect body",
|
||||
"born": "1915-10-04"}
|
||||
|
||||
@"Not sure about exact date..."
|
||||
{"name": "Gregor Samsa",
|
||||
"description": "humanoid trapped in an insect body",
|
||||
"born": "Sometime in October 1915? Or was that when he became an insect?"}
|
||||
#+END_SRC
|
||||
|
||||
This is a mess.
|
||||
We /could/ just try parsing a regular expression to see if it "looks
|
||||
like a date", but doing this kind of thing is prone to errors and weird
|
||||
edge cases.
|
||||
No, it's better to be able to have a separate type:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"Exact date known"
|
||||
{"name": "Gregor Samsa",
|
||||
"description": "humanoid trapped in an insect body",
|
||||
"born": <Date 1915 10 04>}
|
||||
|
||||
@"Not sure about exact date..."
|
||||
{"name": "Gregor Samsa",
|
||||
"description": "humanoid trapped in an insect body",
|
||||
"born": <Unknown "Sometime in October 1915? Or was that when he became an insect?">}
|
||||
#+END_SRC
|
||||
|
||||
Now we can distinguish the two.
|
||||
|
||||
We can make as many Record types as our program needs, though it is up
|
||||
to our program to make sense of what these mean.
|
||||
Since Preserves does not specify the =Date= itself, both the program
|
||||
(or person) writing the Preserves document and the program reading it
|
||||
need to have a mutual understanding of how many fields it has and what
|
||||
the meaning the label signifies for it to be of use.
|
||||
|
||||
Still, there are plenty of interesting labels we can define.
|
||||
Here is one for an "iri", a hyperlink:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
<iri "https://dustycloud.org/blog/">
|
||||
#+END_SRC
|
||||
|
||||
That's nice enough, but here's another interesting detail... labels on
|
||||
Records are usually symbols but aren't necessarily so.
|
||||
They can also be strings or numbers or even dictionaries.
|
||||
And very interestingly, they can also be other records:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
<<iri "https://www.w3.org/ns/activitystreams#Note">
|
||||
{"to": [<iri "https://chatty.example/ben/">],
|
||||
"attributedTo": <iri "https://social.example/alyssa/">,
|
||||
"content": "Say, did you finish reading that book I lent you?"}>
|
||||
#+END_SRC
|
||||
|
||||
Do you see it? This Record's label is... an =iri= Record!
|
||||
The link here points to a more precise term saying that "this is a
|
||||
note meant to be sent around in social networks".
|
||||
It is considerably more precise than just using the string or symbol
|
||||
"Note", which could be ambiguous.
|
||||
(A social networking note? A footnote? A music note?)
|
||||
While not all systems need this, this (partial) example hints at how
|
||||
Preserves can also be used to coordinate meaning in larger, more
|
||||
decentralized systems.
|
||||
|
||||
Likewise, it is also possible to annotate records with integers.
|
||||
Languages like OCaml use integers instead of symbolic record labels
|
||||
because their type systems ensure that it is never ambiguous what,
|
||||
say, the label =23= means in any given context.
|
||||
Allowing integer record labels lets Preserves directly express OCaml
|
||||
data.
|
||||
|
||||
# 2019-08-18 14:06:24 tonyg -- I like the following idea in principle,
|
||||
# but I don't think it belongs here yet. The *binary* syntax has
|
||||
# "placeholder" values but the *text* syntax doesn't. Preserves is all
|
||||
# about equivalences: so <1 #true> is different from <Employee #true>.
|
||||
# If you don't have the key -- the mapping between 1 and Employee --
|
||||
# you can't know to identify the two! It's an open design question
|
||||
# whether to keep placeholder values, and if so how to introduce them
|
||||
# (!) without causing confusion; it's probably better in general to
|
||||
# just gzip encoded values (!!)...
|
||||
#
|
||||
# -- original text from dustyweb follows --
|
||||
|
||||
# A system could use this to reduce the redundancy cost of labels sent
|
||||
# over the wire by indexing labels and substituting them after reading
|
||||
# the structures.
|
||||
|
||||
# #+BEGIN_SRC preserves
|
||||
# @"The ordered index of labels for this session"
|
||||
# [Employee, Role, Date]
|
||||
|
||||
# @"We could then transform this structure..."
|
||||
# #set{<Employee @"employee name"
|
||||
# "Alyssa P. Hacker"
|
||||
# @"employee roles"
|
||||
# #set{<Role Programmer>,
|
||||
# <Role Manager>},
|
||||
# @"when hired"
|
||||
# <Date 2018, 1, 24>>,
|
||||
# <Employee @"employee name"
|
||||
# "Ben Bitdiddle"
|
||||
# @"employee roles"
|
||||
# #set{<Role Programmer>},
|
||||
# @"when hired"
|
||||
# <Date 2019, 2, 13>>}
|
||||
|
||||
# @"... to this structure, which in binary is 91 as opposed to 127 bytes"
|
||||
# #set{<0 @"employee name"
|
||||
# "Alyssa P. Hacker"
|
||||
# @"employee roles"
|
||||
# #set{<1 Programmer>,
|
||||
# <1 Manager>},
|
||||
# @"when hired"
|
||||
# <2 2018, 1, 24>>,
|
||||
# <0 @"employee name"
|
||||
# "Ben Bitdiddle"
|
||||
# @"employee roles"
|
||||
# #set{<1 Programmer>},
|
||||
# @"when hired"
|
||||
# <2 2019, 2, 13>>}
|
||||
# #+END_SRC
|
||||
|
||||
# Even in this trivial example, this is a 25% reduction in the binary size.
|
||||
# Even though tooling to do this does not come out of the box in Preserves,
|
||||
# the fact that Record labels can be anything makes it possible to build this
|
||||
# or any such appropriate structure.
|
||||
|
||||
** Annotations
|
||||
|
||||
Annotations are not strictly a necessary feature, but they are useful
|
||||
in some circumstances.
|
||||
We have previously shown them used as comments:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"I'm a comment!"
|
||||
"I am not a comment, I am data!"
|
||||
#+END_SRC
|
||||
|
||||
Annotations annotate the values they precede.
|
||||
It is possible to have multiple annotations on a value.
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@"I am annotating this number"
|
||||
@"And so am I!"
|
||||
42
|
||||
#+END_SRC
|
||||
|
||||
As said, annotations are not really data.
|
||||
They are merely meant for development tooling or debugging.
|
||||
You have to explicitly ask for them when reading, and they wrap all
|
||||
the values.
|
||||
Many implementations will, in the same mode, also supply line number
|
||||
and column information attached to each read value.
|
||||
|
||||
So what's the point of them then?
|
||||
If annotations were just for comments, there would be indeed hardly
|
||||
point at all... it would be simpler to just provide a comment syntax.
|
||||
|
||||
However, annotations can be used for more than just comments.
|
||||
They can also be used for debugging or other development-tool-oriented
|
||||
data.
|
||||
|
||||
# 2019-08-18 14:10:15 tonyg -- Similarly, I am uncomfortable with this
|
||||
# example. It seems to me that the annotations are indeed domain data,
|
||||
# just in a different domain of "project management" rather than the
|
||||
# domain of "NPC data sheet"! Annotations are intended for the domain
|
||||
# of *programming and debugging software systems* -- they're intended
|
||||
# for reflective use. You use them when you're thinking about
|
||||
# preserves artifacts per se, rather than anything about the domain of
|
||||
# the data encoded within a given preserves artifact.
|
||||
#
|
||||
# Maybe a good example is something like an HTTP API? You could
|
||||
# annotate a response with the time it took to be produced in
|
||||
# milliseconds. I'll sketch something out.
|
||||
#
|
||||
# -- original text from dustyweb follows --
|
||||
|
||||
# For instance, here is some data game data annotated with who the
|
||||
# "project owner" is of each object.
|
||||
|
||||
# #+BEGIN_SRC preserves
|
||||
# <NpcCatalog
|
||||
# "Monsters"
|
||||
# #set{@<ProjectLead Alyssa>
|
||||
# {name: "Ogre",
|
||||
# spriteSheet: #base64{T2dyZSBzcHJpdGVzIGdvIGhlcmU=},
|
||||
# attributes: #set{biped, brute, rage, clumsy}},
|
||||
# @<ProjectLead Ben>
|
||||
# {name: "Jackal",
|
||||
# spriteSheet: #base64{V2l0Y2ggc3ByaXRlcyBnbyBoZXJl},
|
||||
# attributes: #set{quadruped, swift, pack-animal, weak}}}>
|
||||
# #+END_SRC
|
||||
|
||||
# Each monster descrived in the set is annotated with a =ProjectLead=
|
||||
# record.
|
||||
# While useful information used by the game company's organization
|
||||
# system, it doesn't particularly matter when reading in the data
|
||||
# just as code.
|
||||
|
||||
For instance, here's a reply from an HTTP API service running in
|
||||
"debug" mode annotated with the time it took to produce the reply and
|
||||
the internal name of the server that produced the response:
|
||||
|
||||
#+BEGIN_SRC preserves
|
||||
@<ResponseTime <Milliseconds 64.4>>
|
||||
@<BackendServer "humpty-dumpty.example.com">
|
||||
<Success
|
||||
<Employees [
|
||||
<Employee "Alyssa P. Hacker" #set{<Role Programmer>, <Role Manager>}, <Date 2018, 1, 24>>
|
||||
<Employee "Ben Bitdiddle" #set{<Role Programmer>}, <Date 2019, 2, 13>> ]>>
|
||||
#+END_SRC
|
||||
|
||||
The annotations aren't related to the data requested, which is all
|
||||
about "employees"; instead, they're about the systems that produced
|
||||
the response.
|
||||
You could say they're in the domain of "debugging" instead of the
|
||||
domain of "employees".
|
||||
|
||||
* Conclusions
|
||||
|
||||
We've covered the broad strokes of Preserves, but not everything that
|
||||
is possible with it.
|
||||
We leave it as an exercise to the reader to try reading these examples
|
||||
into their languages (several libraries exist already) and writing them
|
||||
out as binary objects.
|
||||
|
||||
But as we've seen, Preserves is a flexible system which comes with
|
||||
well-defined, carefully specified built-in types, as well as a
|
||||
meta-type which can be used as an extension point.
|
||||
|
||||
Happy preserving!
|
Loading…
Reference in New Issue