From dca049ce467005f86c9c6951209e40133e39a696 Mon Sep 17 00:00:00 2001 From: Christopher Lemmer Webber Date: Thu, 15 Aug 2019 19:01:43 -0400 Subject: [PATCH] Add Preserves tutorial --- TUTORIAL.org | 394 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 394 insertions(+) create mode 100644 TUTORIAL.org diff --git a/TUTORIAL.org b/TUTORIAL.org new file mode 100644 index 0000000..9963380 --- /dev/null +++ b/TUTORIAL.org @@ -0,0 +1,394 @@ +#+TITLE: Preserves: a tutorial +#+AUTHOR: Christopher Lemmer Webber + +/This document, like Preserves itself, is released under/ +/[[file:./LICENSE][version 2.0 of the Apache license]]./ + +* Overview + +Preserves is a serialization system which supplies both a +human-readable textual and efficient binary syntax; converting between +the two is straightforward. +Preserves' human readable syntax is easy to read and should be mostly +familiar if you already know systems like JSON. +However, Preserves is less ambiguously specified than JSON, and also +has a clean extension mechanism. + +This document is a tutorial; it does not get into all the details of +Preserves. +For that, see the [[file:./preserves.md][Preserves specification]]. + +* Preserves basics + +** Starting with the familiar + +If you're familiar with JSON, Preserves looks fairly similar: + +#+BEGIN_SRC preserves + {"name": "Missy Rose", + "species": "Felis Catus", + "age": 13, + "foods": ["kibble", "cat treats", "tinned meat"]} +#+END_SRC + +Preserves also has something we can use for debugging/development +information called "annotations"; they aren't actually read in as data +but we can use them for comments. +(They can also be used for other development tools and are not +restricted to strings; more on this later, but for now interpret them +as comments.) + +#+BEGIN_SRC preserves +@"I'm an annotation... basically a comment. Ignore me!" +"I'm data! Don't ignore me!" +#+END_SRC + +Preserves supports some data types you're probably already familiar +with from JSON, and which look fairly similar in the textual format: + +#+BEGIN_SRC preserves +@"booleans" +#true +#false + +@"various kinds of numbers:" +42 +123556789012345678901234567890 +-10 +13.5 + +@"strings" +"I'm feeling stringy!" + +@"sequences (lists)" +["cat", "dog", "mouse", "goldfish"] + +@"dictionaries (hashmaps)" +{"cat": "meow", + "dog": "woof", + "goldfish": "glub glub", + "mouse": "squeak"} +#+END_SRC + +** Going beyond JSON + +We can observe a few differences from JSON already; it's possible to +express numbers of arbitrary length in Preserves, and booleans look a little +bit different. +A few more interesting differences: + +#+BEGIN_SRC preserves +@"Preserves treats commas as whitespace, so these are the same" +["cat", "dog", "mouse", "goldfish"] +["cat" "dog" "mouse" "goldfish"] + +@"We can use anything as keys in dictionaries, not just strings" +{1: "the loneliest number", + ["why", "was", 6, "afraid", "of", 7]: "because 7 8 9", + {"dictionaries": "as keys???"}: "well, why not?"} +#+END_SRC + +Preserves technically provides a few types of numbers: + +#+BEGIN_SRC preserves +@"Signed Integers" +42 +-42 +5907212309572059846509324862304968273468909473609826340 +-5907212309572059846509324862304968273468909473609826340 + +@"Floats (Single-precision IEEE floats) (notice the trailing f)" +3.1415927f + +@"Doubles (Double-precision IEEE floats)" +3.141592653589793 +#+END_SRC + +Preserves also provides some types that don't come in JSON. +=Symbols= are fairly interesting; they look a lot like strings but +really aren't meant to represent text as much as they are, well... a +symbolic name. +Often they're meant to be used for something that has symbolic importance +to the program, but not textual importance (other than to guide the +programmer... not unlike variable names). + +#+BEGIN_SRC preserves +@"A symbol (NOT a string!)" +JustASymbol + +@"You can do mixedCase or CamelCase too of course, pick your poison" +@"(but be consistent, for the sake of your collaborators!" +iAmASymbol +i-am-a-symbol + +@"A list of symbols" +[GET, PUT, POST, DELETE] + +@"A symbol with a space in it" +|this is just one symbol believe it or not| +#+END_SRC + +We can also add binary data, aka ByteStrings: + +#+BEGIN_SRC preserves +@"Some binary data, base64 encoded" +#base64{cGljdHVyZSBvZiBhIGNhdA==} + +@"Some other binary data, hexadecimal encoded" +#hex{616263} + +@"Same binary data as above, base64 encoded" +#base64{YWJj} +#+END_SRC + +What's neat about this is that we don't have to "pay the cost" of +base64 or hexadecimal encoding when we serialize this data to binary; +the length of the binary data is the length of the binary data. + +Conveniently, Preserves also includes Sets, which are collections of +unique elements where ordering of items is unimportant. + +#+BEGIN_SRC preserves +#set{flour, salt, water} +#+END_SRC + +** Total ordering + +This is a good time to mention that even though from a semantic +perspective sets and dictionaries do not carry information about the +ordering of their elements (and Preserves doesn't care what order we +enter them in for our hand-written-as-text Preserves documents), when +serializing to binary data (or even when the Preserves library itself +serializes to textual data), Preserves will always write out the +elements in the same order, every time (aka, "total ordering"). +This is important and useful for many contexts, but especially for +cryptographic signatures. + +#+BEGIN_SRC preserves +@"This hand-typed Preserves document..." +{monkey: {"noise": "ooh-ooh", + "eats": #set{"bananas", "berries"}} + cat: {"noise": "meow", + "eats": #set{"kibble", "cat treats", "tinned meat"}}} + +@"Will always, always be written out in this order:" +{cat: {"eats": #set{"cat treats", "kibble", "tinned meat"}, + "noise": "meow"} + monkey: {"eats": #set{"bananas", "berries"}, + "noise": "ooh-ooh"}} +#+END_SRC + +** Defining our own types using Records + +Finally, there is one more type that Preserves provides... but in a +sense, it's a meta-type. +=Record= objects have a tag and a series of arguments (or "slots"). +For example, we can make a =Date= record: + +#+BEGIN_SRC preserves + +#+END_SRC + +In this example, the =Date= tag is a symbol; 2019, 8, and 15 are the +year, month, and date slots respectively. + +Why do we care about this? +We could instead just decide to encode our date data in a string, +like "2019-08-15". +A document using such a date structure might look like so: + +#+BEGIN_SRC preserves +{"name": "Gregor Samsa", + "description": "humanoid trapped in an insect body", + "born": "1915-10-04"} +#+END_SRC + +Unfortunately, say our boss comes along and tells us that the people +doing data entry have complained that it isn't always possible to get +an exact date. +They would like to be able to type in what they know if they don't +know the date exactly. + +This causes a problem. +Now we might have two kinds of entries: + +#+BEGIN_SRC preserves +@"Exact date known" +{"name": "Gregor Samsa", + "description": "humanoid trapped in an insect body", + "born": "1915-10-04"} + +@"Not sure about exact date..." +{"name": "Gregor Samsa", + "description": "humanoid trapped in an insect body", + "born": "Sometime in October 1915? Or was that when he became an insect?"} +#+END_SRC + +This is a mess. +We /could/ just try parsing a regular expression to see if it "looks +like a date", but doing this kind of thing is prone to errors and weird +edge cases. +No, it's better to be able to have a separate type: + +#+BEGIN_SRC preserves +@"Exact date known" +{"name": "Gregor Samsa", + "description": "humanoid trapped in an insect body", + "born": } + +@"Not sure about exact date..." +{"name": "Gregor Samsa", + "description": "humanoid trapped in an insect body", + "born": "Sometime in October 1915? Or was that when he became an insect?"} +#+END_SRC + +Now we can distinguish the two. + +We can make as many Record types as our program need, though it is up +to our program to make sense of what these mean. +Since Preserves does not specify the =Date= itself, both the program +(or person) writing the Preserves document and the program reading it +need to have a mutual understanding of how many slots it has and what +the meaning the tag signifies for it to be of use. + +Still, there are plenty of interesting tags we can define. +Here is one for an "iri", a hyperlink: + +#+BEGIN_SRC preserves + +#+END_SRC + +That's nice enough, but here's another interesting detail... tags on +Records are usually strings but aren't necessarily so. +They can also be strings or numbers or even dictionaries. +And very interestingly, they can also be other records: + +#+BEGIN_SRC preserves +< + {"to": [], + "attributedTo": , + "content": "Say, did you finish reading that book I lent you?"}> +#+END_SRC + +Do you see it? This object's type is... an iri Record! +The link here points to a more precise term saying that "this is a +note meant to be sent around in social networks". +It is considerably more precise than just using the string or symbol +"Note", which could be ambiguous. +(A social networking note? A footnote? A music note? +While not all systems need this, this (partial) example hints at how +Preserves can also be used to coordinate meaning in larger, more +decentralized systems. + +Likewise, it is also possible to annotate records with integers. +A system could use this to reduce the redundancy cost of tags sent +over the wire by indexing tags and substituting them after reading +the structures. + +#+BEGIN_SRC preserves + @"The ordered index of tags for this session" + [Employee, Role, Date] + + @"We could then transform this structure..." + #set{, + }, + @"when hired" + >, + }, + @"when hired" + >} + + @"... to this structure, which in binary is 91 as opposed to 127 bytes" + #set{<0 @"employee name" + "Alyssa P. Hacker" + @"employee roles" + #set{<1 Programmer>, + <1 Manager>}, + @"when hired" + <2 2018, 1, 24>>, + <0 @"employee name" + "Ben Bitdiddle" + @"employee roles" + #set{<1 Programmer>}, + @"when hired" + <2 2019, 2, 13>>} +#+END_SRC + +Even in this trivial example, this is a 25% reduction in the binary size. +Even though tooling to do this does not come out of the box in Preserves, +the fact that Record tags can be anything makes it possible to build this +or any such appropriate structure. + +** Annotations + +Annotations are not strictly a necessary feature, but they are useful +in some circumstances. +We have previously shown them used as comments: + +#+BEGIN_SRC preserves +@"I'm a comment!" +"I am not a comment, I am data!" +#+END_SRC + +Annotations annotate the values the precede. +It is possible to have multiple annotations on a value. + +#+BEGIN_SRC preserves +@"I am annotating this number" +@"And so am I!" +42 +#+END_SRC + +As said, annotations are not really data. +They are merely meant for development tooling or debugging. +You have to explicitly ask for them when reading, and they wrap all +the values. + +So what's the point of them then? +If annotations were just for comments, there would be indeed hardly +point at all... it would be simpler to just provide a comment syntax. + +However, annotations can be used for more than just comments. +They can also be used for debugging or other development-tool-oriented +data. +For instance, here is some data game data annotated with who the +"project owner" is of each object. + +#+BEGIN_SRC preserves + + {name: "Ogre", + spriteSheet: #base64{T2dyZSBzcHJpdGVzIGdvIGhlcmU=}, + attributes: #set{biped, brute, rage, clumsy}}, + @ + {name: "Jackal", + spriteSheet: #base64{V2l0Y2ggc3ByaXRlcyBnbyBoZXJl}, + attributes: #set{quadruped, swift, pack-animal, weak}}}> +#+END_SRC + +Each monster descrived in the set is annotated with a =ProjectLead= +record. +While useful information used by the game company's organization +system, it doesn't particularly matter when reading in the data +just as code. + +* Conclusions + +We've covered the broad strokes of Preserves, but not everything that +is possible with it. +We leave it as an exercise to the reader to try reading these examples +into their languages (several libraries exist already) and writing them +out as binary objects. + +But as we've seen, Preserves is a flexible system which comes with +well-defined, carefully specified built-in types, as well as a +meta-type which can be used as an extension point. + +Happy preserving!