diff --git a/TUTORIAL.md b/TUTORIAL.md new file mode 100644 index 0000000..2f2b8f4 --- /dev/null +++ b/TUTORIAL.md @@ -0,0 +1,393 @@ +# Preserves: a tutorial + +By Christopher Lemmer Webber and Tony Garnock-Jones + +*This document, like Preserves itself, is released under* +*[version 2.0 of the Apache license](./LICENSE).* + + + + +# Overview + +Preserves is a serialization system which supplies both a +human-readable textual and efficient binary syntax; converting between +the two is straightforward. +Preserves' human readable syntax is easy to read and should be mostly +familiar if you already know systems like JSON. +However, Preserves is more precisely specified than JSON, and also +has a clean extension mechanism. + +This document is a tutorial; it does not get into all the details of +Preserves. +For that, see the [Preserves specification](./preserves.md). + + + + +# Preserves basics + + + + +## Starting with the familiar + +If you're familiar with JSON, Preserves looks fairly similar: + +``` javascript + {"name": "Missy Rose", + "species": "Felis Catus", + "age": 13, + "foods": ["kibble", "cat treats", "tinned meat"]} +``` + +Preserves also has something we can use for debugging/development +information called "annotations"; they aren't actually read in as data +but we can use them for comments. +(They can also be used for other development tools and are not +restricted to strings; more on this later, but for now interpret them +as comments.) + +``` javascript + @"I'm an annotation... basically a comment. Ignore me!" + "I'm data! Don't ignore me!" +``` + +Preserves supports some data types you're probably already familiar +with from JSON, and which look fairly similar in the textual format: + +``` javascript + @"booleans" + #true + #false + + @"various kinds of numbers:" + 42 + 123556789012345678901234567890 + -10 + 13.5 + + @"strings" + "I'm feeling stringy!" + + @"sequences (lists)" + ["cat", "dog", "mouse", "goldfish"] + + @"dictionaries (hashmaps)" + {"cat": "meow", + "dog": "woof", + "goldfish": "glub glub", + "mouse": "squeak"} +``` + + + + +## Going beyond JSON + +We can observe a few differences from JSON already; it's possible to +express numbers of arbitrary length in Preserves, and booleans look a little +bit different. +A few more interesting differences: + +``` javascript + @"Preserves treats commas as whitespace, so these are the same" + ["cat", "dog", "mouse", "goldfish"] + ["cat" "dog" "mouse" "goldfish"] + + @"We can use anything as keys in dictionaries, not just strings" + {1: "the loneliest number", + ["why", "was", 6, "afraid", "of", 7]: "because 7 8 9", + {"dictionaries": "as keys???"}: "well, why not?"} +``` + +Preserves technically provides a few types of numbers: + +``` javascript + @"Signed Integers" + 42 + -42 + 5907212309572059846509324862304968273468909473609826340 + -5907212309572059846509324862304968273468909473609826340 + + @"Floats (Single-precision IEEE floats) (notice the trailing f)" + 3.1415927f + + @"Doubles (Double-precision IEEE floats)" + 3.141592653589793 +``` + +Preserves also provides some types that don't come in JSON. +`Symbols` are fairly interesting; they look a lot like strings but +really aren't meant to represent text as much as they are, well… a +symbolic name. +Often they're meant to be used for something that has symbolic importance +to the program, but not textual importance (other than to guide the +programmer… not unlike variable names). + +``` javascript + @"A symbol (NOT a string!)" + JustASymbol + + @"You can do mixedCase or CamelCase too of course, pick your poison" + @"(but be consistent, for the sake of your collaborators!" + iAmASymbol + i-am-a-symbol + + @"A list of symbols" + [GET, PUT, POST, DELETE] + + @"A symbol with spaces in it" + |this is just one symbol believe it or not| +``` + +We can also add binary data, aka ByteStrings: + +``` javascript + @"Some binary data, base64 encoded" + #base64{cGljdHVyZSBvZiBhIGNhdA==} + + @"Some other binary data, hexadecimal encoded" + #hex{616263} + + @"Same binary data as above, base64 encoded" + #base64{YWJj} +``` + +What's neat about this is that we don't have to "pay the cost" of +base64 or hexadecimal encoding when we serialize this data to binary; +the length of the binary data is the length of the binary data. + +Conveniently, Preserves also includes Sets, which are collections of +unique elements where ordering of items is unimportant. + +``` javascript + #set{flour, salt, water} +``` + + + +## Total ordering and canonicalization + +This is a good time to mention that even though from a semantic +perspective sets and dictionaries do not carry information about the +ordering of their elements (and Preserves doesn't care what order we +enter them in for our hand-written-as-text Preserves documents), +Preserves has a well-defined "total ordering". + +**FULL WARNING:** the following claim is not implemented yet. :) +Coming soon! + +Based on this total ordering, Preserves provides support for canonical +ordering when serializing; in this mode, Preserves will always write +out the elements in the same order, every time. +When combined with binary serialization, this is Preserves' "canonical +form". +This is important and useful for many contexts, but especially for +cryptographic signatures and hashing. + +``` javascript + @"This hand-typed Preserves document..." + {monkey: {"noise": "ooh-ooh", + "eats": #set{"bananas", "berries"}} + cat: {"noise": "meow", + "eats": #set{"kibble", "cat treats", "tinned meat"}}} + + @"Will always, always be written out in this order when canonicalized:" + {cat: {"eats": #set{"cat treats", "kibble", "tinned meat"}, + "noise": "meow"} + monkey: {"eats": #set{"bananas", "berries"}, + "noise": "ooh-ooh"}} +``` + +Clever implementations can get canonicalized output for free by +carefully ordering set elements and dictionary entries at construction +time, but even in simple implementations, canonical serialization is +almost as cheap as normal serialization. + + + + +## Defining our own types using Records + +Finally, there is one more type that Preserves provides… but in a +sense, it's a meta-type. +`Record` objects have a label and a series of arguments (or "fields"). +For example, we can make a `Date` record: + +``` javascript + +``` + +In this example, the `Date` label is a symbol; 2019, 8, and 15 are the +year, month, and day fields respectively. + +Why do we care about this? +We could instead just decide to encode our date data in a string, +like "2019-08-15". +A document using such a date structure might look like so: + +``` javascript + {"name": "Gregor Samsa", + "description": "humanoid trapped in an insect body", + "born": "1915-10-04"} +``` + +Unfortunately, say our boss comes along and tells us that the people +doing data entry have complained that it isn't always possible to get +an exact date. +They would like to be able to type in what they know if they don't +know the date exactly. + +This causes a problem. +Now we might have two kinds of entries: + +``` javascript + @"Exact date known" + {"name": "Gregor Samsa", + "description": "humanoid trapped in an insect body", + "born": "1915-10-04"} + + @"Not sure about exact date..." + {"name": "Gregor Samsa", + "description": "humanoid trapped in an insect body", + "born": "Sometime in October 1915? Or was that when he became an insect?"} +``` + +This is a mess. +We *could* just try parsing a regular expression to see if it "looks +like a date", but doing this kind of thing is prone to errors and weird +edge cases. +No, it's better to be able to have a separate type: + +``` javascript + @"Exact date known" + {"name": "Gregor Samsa", + "description": "humanoid trapped in an insect body", + "born": } + + @"Not sure about exact date..." + {"name": "Gregor Samsa", + "description": "humanoid trapped in an insect body", + "born": } +``` + +Now we can distinguish the two. + +We can make as many Record types as our program needs, though it is up +to our program to make sense of what these mean. +Since Preserves does not specify the `Date` itself, both the program +(or person) writing the Preserves document and the program reading it +need to have a mutual understanding of how many fields it has and what +the meaning the label signifies for it to be of use. + +Still, there are plenty of interesting labels we can define. +Here is one for an "iri", a hyperlink: + +``` javascript + +``` + +That's nice enough, but here's another interesting detail… labels on +Records are usually symbols but aren't necessarily so. +They can also be strings or numbers or even dictionaries. +And very interestingly, they can also be other records: + +``` javascript + < + {"to": [], + "attributedTo": , + "content": "Say, did you finish reading that book I lent you?"}> +``` + +Do you see it? This Record's label is… an `iri` Record! +The link here points to a more precise term saying that "this is a +note meant to be sent around in social networks". +It is considerably more precise than just using the string or symbol +"Note", which could be ambiguous. +(A social networking note? A footnote? A music note?) +While not all systems need this, this (partial) example hints at how +Preserves can also be used to coordinate meaning in larger, more +decentralized systems. + +Likewise, it is also possible to annotate records with integers. +Languages like OCaml use integers instead of symbolic record labels +because their type systems ensure that it is never ambiguous what, +say, the label `23` means in any given context. +Allowing integer record labels lets Preserves directly express OCaml +data. + + + + +## Annotations + +Annotations are not strictly a necessary feature, but they are useful +in some circumstances. +We have previously shown them used as comments: + +``` javascript + @"I'm a comment!" + "I am not a comment, I am data!" +``` + +Annotations annotate the values they precede. +It is possible to have multiple annotations on a value. + +``` javascript + @"I am annotating this number" + @"And so am I!" + 42 +``` + +As said, annotations are not really data. +They are merely meant for development tooling or debugging. +You have to explicitly ask for them when reading, and they wrap all +the values. +Many implementations will, in the same mode, also supply line number +and column information attached to each read value. + +So what's the point of them then? +If annotations were just for comments, there would be indeed hardly +point at all… it would be simpler to just provide a comment syntax. + +However, annotations can be used for more than just comments. +They can also be used for debugging or other development-tool-oriented +data. + +For instance, here's a reply from an HTTP API service running in +"debug" mode annotated with the time it took to produce the reply and +the internal name of the server that produced the response: + +``` javascript + @> + @ + , }, > + }, > ]>> +``` + +The annotations aren't related to the data requested, which is all +about "employees"; instead, they're about the systems that produced +the response. +You could say they're in the domain of "debugging" instead of the +domain of "employees". + + + + +# Conclusions + +We've covered the broad strokes of Preserves, but not everything that +is possible with it. +We leave it as an exercise to the reader to try reading these examples +into their languages (several libraries exist already) and writing them +out as binary objects. + +But as we've seen, Preserves is a flexible system which comes with +well-defined, carefully specified built-in types, as well as a +meta-type which can be used as an extension point. + +Happy preserving! + diff --git a/TUTORIAL.org b/TUTORIAL.org deleted file mode 100644 index 13de634..0000000 --- a/TUTORIAL.org +++ /dev/null @@ -1,461 +0,0 @@ -#+TITLE: Preserves: a tutorial -#+AUTHOR: Christopher Lemmer Webber - -/This document, like Preserves itself, is released under/ -/[[file:./LICENSE][version 2.0 of the Apache license]]./ - -* Overview - -Preserves is a serialization system which supplies both a -human-readable textual and efficient binary syntax; converting between -the two is straightforward. -Preserves' human readable syntax is easy to read and should be mostly -familiar if you already know systems like JSON. -However, Preserves is more precisely specified than JSON, and also -has a clean extension mechanism. - -This document is a tutorial; it does not get into all the details of -Preserves. -For that, see the [[file:./preserves.md][Preserves specification]]. - -* Preserves basics - -** Starting with the familiar - -If you're familiar with JSON, Preserves looks fairly similar: - -#+BEGIN_SRC preserves - {"name": "Missy Rose", - "species": "Felis Catus", - "age": 13, - "foods": ["kibble", "cat treats", "tinned meat"]} -#+END_SRC - -Preserves also has something we can use for debugging/development -information called "annotations"; they aren't actually read in as data -but we can use them for comments. -(They can also be used for other development tools and are not -restricted to strings; more on this later, but for now interpret them -as comments.) - -#+BEGIN_SRC preserves -@"I'm an annotation... basically a comment. Ignore me!" -"I'm data! Don't ignore me!" -#+END_SRC - -Preserves supports some data types you're probably already familiar -with from JSON, and which look fairly similar in the textual format: - -#+BEGIN_SRC preserves -@"booleans" -#true -#false - -@"various kinds of numbers:" -42 -123556789012345678901234567890 --10 -13.5 - -@"strings" -"I'm feeling stringy!" - -@"sequences (lists)" -["cat", "dog", "mouse", "goldfish"] - -@"dictionaries (hashmaps)" -{"cat": "meow", - "dog": "woof", - "goldfish": "glub glub", - "mouse": "squeak"} -#+END_SRC - -** Going beyond JSON - -We can observe a few differences from JSON already; it's possible to -express numbers of arbitrary length in Preserves, and booleans look a little -bit different. -A few more interesting differences: - -#+BEGIN_SRC preserves -@"Preserves treats commas as whitespace, so these are the same" -["cat", "dog", "mouse", "goldfish"] -["cat" "dog" "mouse" "goldfish"] - -@"We can use anything as keys in dictionaries, not just strings" -{1: "the loneliest number", - ["why", "was", 6, "afraid", "of", 7]: "because 7 8 9", - {"dictionaries": "as keys???"}: "well, why not?"} -#+END_SRC - -Preserves technically provides a few types of numbers: - -#+BEGIN_SRC preserves -@"Signed Integers" -42 --42 -5907212309572059846509324862304968273468909473609826340 --5907212309572059846509324862304968273468909473609826340 - -@"Floats (Single-precision IEEE floats) (notice the trailing f)" -3.1415927f - -@"Doubles (Double-precision IEEE floats)" -3.141592653589793 -#+END_SRC - -Preserves also provides some types that don't come in JSON. -=Symbols= are fairly interesting; they look a lot like strings but -really aren't meant to represent text as much as they are, well... a -symbolic name. -Often they're meant to be used for something that has symbolic importance -to the program, but not textual importance (other than to guide the -programmer... not unlike variable names). - -#+BEGIN_SRC preserves -@"A symbol (NOT a string!)" -JustASymbol - -@"You can do mixedCase or CamelCase too of course, pick your poison" -@"(but be consistent, for the sake of your collaborators!" -iAmASymbol -i-am-a-symbol - -@"A list of symbols" -[GET, PUT, POST, DELETE] - -@"A symbol with spaces in it" -|this is just one symbol believe it or not| -#+END_SRC - -We can also add binary data, aka ByteStrings: - -#+BEGIN_SRC preserves -@"Some binary data, base64 encoded" -#base64{cGljdHVyZSBvZiBhIGNhdA==} - -@"Some other binary data, hexadecimal encoded" -#hex{616263} - -@"Same binary data as above, base64 encoded" -#base64{YWJj} -#+END_SRC - -What's neat about this is that we don't have to "pay the cost" of -base64 or hexadecimal encoding when we serialize this data to binary; -the length of the binary data is the length of the binary data. - -Conveniently, Preserves also includes Sets, which are collections of -unique elements where ordering of items is unimportant. - -#+BEGIN_SRC preserves -#set{flour, salt, water} -#+END_SRC - -** Total ordering and canonicalization - -This is a good time to mention that even though from a semantic -perspective sets and dictionaries do not carry information about the -ordering of their elements (and Preserves doesn't care what order we -enter them in for our hand-written-as-text Preserves documents), -Preserves has a well-defined "total ordering". - -*FULL WARNING:* the following claim is not implemented yet. :) -Coming soon! - -Based on this total ordering, Preserves provides support for canonical -ordering when serializing; in this mode, Preserves will always write -out the elements in the same order, every time. -When combined with binary serialization, this is Preserves' "canonical -form". -This is important and useful for many contexts, but especially for -cryptographic signatures and hashing. - -#+BEGIN_SRC preserves -@"This hand-typed Preserves document..." -{monkey: {"noise": "ooh-ooh", - "eats": #set{"bananas", "berries"}} - cat: {"noise": "meow", - "eats": #set{"kibble", "cat treats", "tinned meat"}}} - -@"Will always, always be written out in this order when canonicalized:" -{cat: {"eats": #set{"cat treats", "kibble", "tinned meat"}, - "noise": "meow"} - monkey: {"eats": #set{"bananas", "berries"}, - "noise": "ooh-ooh"}} -#+END_SRC - -Clever implementations can get canonicalized output for free by -carefully ordering set elements and dictionary entries at construction -time, but even in simple implementations, canonical serialization is -almost as cheap as normal serialization. - -** Defining our own types using Records - -Finally, there is one more type that Preserves provides... but in a -sense, it's a meta-type. -=Record= objects have a label and a series of arguments (or "fields"). -For example, we can make a =Date= record: - -#+BEGIN_SRC preserves - -#+END_SRC - -In this example, the =Date= label is a symbol; 2019, 8, and 15 are the -year, month, and day fields respectively. - -Why do we care about this? -We could instead just decide to encode our date data in a string, -like "2019-08-15". -A document using such a date structure might look like so: - -#+BEGIN_SRC preserves -{"name": "Gregor Samsa", - "description": "humanoid trapped in an insect body", - "born": "1915-10-04"} -#+END_SRC - -Unfortunately, say our boss comes along and tells us that the people -doing data entry have complained that it isn't always possible to get -an exact date. -They would like to be able to type in what they know if they don't -know the date exactly. - -This causes a problem. -Now we might have two kinds of entries: - -#+BEGIN_SRC preserves -@"Exact date known" -{"name": "Gregor Samsa", - "description": "humanoid trapped in an insect body", - "born": "1915-10-04"} - -@"Not sure about exact date..." -{"name": "Gregor Samsa", - "description": "humanoid trapped in an insect body", - "born": "Sometime in October 1915? Or was that when he became an insect?"} -#+END_SRC - -This is a mess. -We /could/ just try parsing a regular expression to see if it "looks -like a date", but doing this kind of thing is prone to errors and weird -edge cases. -No, it's better to be able to have a separate type: - -#+BEGIN_SRC preserves -@"Exact date known" -{"name": "Gregor Samsa", - "description": "humanoid trapped in an insect body", - "born": } - -@"Not sure about exact date..." -{"name": "Gregor Samsa", - "description": "humanoid trapped in an insect body", - "born": } -#+END_SRC - -Now we can distinguish the two. - -We can make as many Record types as our program needs, though it is up -to our program to make sense of what these mean. -Since Preserves does not specify the =Date= itself, both the program -(or person) writing the Preserves document and the program reading it -need to have a mutual understanding of how many fields it has and what -the meaning the label signifies for it to be of use. - -Still, there are plenty of interesting labels we can define. -Here is one for an "iri", a hyperlink: - -#+BEGIN_SRC preserves - -#+END_SRC - -That's nice enough, but here's another interesting detail... labels on -Records are usually symbols but aren't necessarily so. -They can also be strings or numbers or even dictionaries. -And very interestingly, they can also be other records: - -#+BEGIN_SRC preserves -< - {"to": [], - "attributedTo": , - "content": "Say, did you finish reading that book I lent you?"}> -#+END_SRC - -Do you see it? This Record's label is... an =iri= Record! -The link here points to a more precise term saying that "this is a -note meant to be sent around in social networks". -It is considerably more precise than just using the string or symbol -"Note", which could be ambiguous. -(A social networking note? A footnote? A music note?) -While not all systems need this, this (partial) example hints at how -Preserves can also be used to coordinate meaning in larger, more -decentralized systems. - -Likewise, it is also possible to annotate records with integers. -Languages like OCaml use integers instead of symbolic record labels -because their type systems ensure that it is never ambiguous what, -say, the label =23= means in any given context. -Allowing integer record labels lets Preserves directly express OCaml -data. - -# 2019-08-18 14:06:24 tonyg -- I like the following idea in principle, -# but I don't think it belongs here yet. The *binary* syntax has -# "placeholder" values but the *text* syntax doesn't. Preserves is all -# about equivalences: so <1 #true> is different from . -# If you don't have the key -- the mapping between 1 and Employee -- -# you can't know to identify the two! It's an open design question -# whether to keep placeholder values, and if so how to introduce them -# (!) without causing confusion; it's probably better in general to -# just gzip encoded values (!!)... -# -# -- original text from dustyweb follows -- - -# A system could use this to reduce the redundancy cost of labels sent -# over the wire by indexing labels and substituting them after reading -# the structures. - -# #+BEGIN_SRC preserves -# @"The ordered index of labels for this session" -# [Employee, Role, Date] - -# @"We could then transform this structure..." -# #set{, -# }, -# @"when hired" -# >, -# }, -# @"when hired" -# >} - -# @"... to this structure, which in binary is 91 as opposed to 127 bytes" -# #set{<0 @"employee name" -# "Alyssa P. Hacker" -# @"employee roles" -# #set{<1 Programmer>, -# <1 Manager>}, -# @"when hired" -# <2 2018, 1, 24>>, -# <0 @"employee name" -# "Ben Bitdiddle" -# @"employee roles" -# #set{<1 Programmer>}, -# @"when hired" -# <2 2019, 2, 13>>} -# #+END_SRC - -# Even in this trivial example, this is a 25% reduction in the binary size. -# Even though tooling to do this does not come out of the box in Preserves, -# the fact that Record labels can be anything makes it possible to build this -# or any such appropriate structure. - -** Annotations - -Annotations are not strictly a necessary feature, but they are useful -in some circumstances. -We have previously shown them used as comments: - -#+BEGIN_SRC preserves -@"I'm a comment!" -"I am not a comment, I am data!" -#+END_SRC - -Annotations annotate the values they precede. -It is possible to have multiple annotations on a value. - -#+BEGIN_SRC preserves -@"I am annotating this number" -@"And so am I!" -42 -#+END_SRC - -As said, annotations are not really data. -They are merely meant for development tooling or debugging. -You have to explicitly ask for them when reading, and they wrap all -the values. -Many implementations will, in the same mode, also supply line number -and column information attached to each read value. - -So what's the point of them then? -If annotations were just for comments, there would be indeed hardly -point at all... it would be simpler to just provide a comment syntax. - -However, annotations can be used for more than just comments. -They can also be used for debugging or other development-tool-oriented -data. - -# 2019-08-18 14:10:15 tonyg -- Similarly, I am uncomfortable with this -# example. It seems to me that the annotations are indeed domain data, -# just in a different domain of "project management" rather than the -# domain of "NPC data sheet"! Annotations are intended for the domain -# of *programming and debugging software systems* -- they're intended -# for reflective use. You use them when you're thinking about -# preserves artifacts per se, rather than anything about the domain of -# the data encoded within a given preserves artifact. -# -# Maybe a good example is something like an HTTP API? You could -# annotate a response with the time it took to be produced in -# milliseconds. I'll sketch something out. -# -# -- original text from dustyweb follows -- - -# For instance, here is some data game data annotated with who the -# "project owner" is of each object. - -# #+BEGIN_SRC preserves -# -# {name: "Ogre", -# spriteSheet: #base64{T2dyZSBzcHJpdGVzIGdvIGhlcmU=}, -# attributes: #set{biped, brute, rage, clumsy}}, -# @ -# {name: "Jackal", -# spriteSheet: #base64{V2l0Y2ggc3ByaXRlcyBnbyBoZXJl}, -# attributes: #set{quadruped, swift, pack-animal, weak}}}> -# #+END_SRC - -# Each monster descrived in the set is annotated with a =ProjectLead= -# record. -# While useful information used by the game company's organization -# system, it doesn't particularly matter when reading in the data -# just as code. - -For instance, here's a reply from an HTTP API service running in -"debug" mode annotated with the time it took to produce the reply and -the internal name of the server that produced the response: - -#+BEGIN_SRC preserves -@> -@ -, }, > - }, > ]>> -#+END_SRC - -The annotations aren't related to the data requested, which is all -about "employees"; instead, they're about the systems that produced -the response. -You could say they're in the domain of "debugging" instead of the -domain of "employees". - -* Conclusions - -We've covered the broad strokes of Preserves, but not everything that -is possible with it. -We leave it as an exercise to the reader to try reading these examples -into their languages (several libraries exist already) and writing them -out as binary objects. - -But as we've seen, Preserves is a flexible system which comes with -well-defined, carefully specified built-in types, as well as a -meta-type which can be used as an extension point. - -Happy preserving!