diff --git a/preserves.md b/preserves.md index 16367c9..c9f7f7d 100644 --- a/preserves.md +++ b/preserves.md @@ -6,7 +6,7 @@ # Preserves: an Expressive Data Language Tony Garnock-Jones -September 2018. Version 0.0.3. +November 2018. Version 0.0.4. [sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt [spki]: http://world.std.com/~cme/html/spki.html @@ -17,10 +17,11 @@ September 2018. Version 0.0.3. This document proposes a data model and serialization format called *Preserves*. -Preserves supports *records* with user-defined *labels*. This makes it -more expressive[^macro-expressiveness] than most data languages in use -on the web and allows it to easily represent the *labelled sums of -products* as seen in many functional programming languages. +Preserves supports *records* with user-defined *labels*. This relieves +the confusion caused by encoding records as dictionaries, seen in most +data languages in use on the web. It also allows Preserves to easily +represent the *labelled sums of products* as seen in many functional +programming languages. Preserves also supports the usual suite of atomic and compound data types, in particular including *binary* data as a distinct type from @@ -30,27 +31,11 @@ Finally, Preserves defines precisely how to *compare* two values. Comparison is based on the data model, not on syntax or on data structures of any particular implementation language. - [^macro-expressiveness]: By "expressive" I mean *macro-expressive* - in the sense of Felleisen's 1991 paper, "On the Expressive Power - of Programming Languages". - - Roughly speaking, there's no way in a JSON document to introduce a - new kind of information (such as binary data, or a date-stamp, or - a "person" object) in an *unambiguous way* without *global - agreement* from every potential consumer of the document. With an - extensible labelled record type, there is. - - Felleisen, Matthias. “On the Expressive Power of Programming - Languages.” Science of Computer Programming 17, no. 1--3 (1991): - 35–75. - ## Starting with Semantics Taking inspiration from functional programming, we start with a definition of the *values* that we want to work with and give them -meaning independent of their syntax. When we write examples of values, -we will do so using the [textual syntax](#textual-syntax) defined -later in this document. +meaning independent of their syntax. Our `Value`s fall into two broad categories: *atomic* and *compound* data. @@ -98,11 +83,6 @@ neither is less than the other according to the total order. A `SignedInteger` is a signed integer of arbitrary width. `SignedInteger`s are compared as mathematical integers. -**Examples.** 10; -6; 0. - -**Non-examples.** NaN (the clue is in the name!); ∞ (not finite); 0.2 -(not an integer); 1/7 (likewise); 2+*i*3 (likewise); √2 (likewise). - ### Unicode strings. A `String` is a sequence of Unicode @@ -114,19 +94,10 @@ code-point.[^utf8-is-awesome] gives the same result as a lexicographic byte-by-byte comparison of the UTF-8 encoding of a string! -**Examples.** `"Hello world"`, an eleven-code-point string; `"z水𝄞"`, -the string containing the three Unicode code-points `z` (0x7A), `水` -(0x6C34) and `𝄞` (0x1D11E); `""`, the empty string. - ### Binary data. -A `ByteString` is an ordered sequence of zero or more eight-bit bytes. -`ByteString`s are compared lexicographically. - -**Examples.** `#""`, the empty `ByteString`; `#"ABC"`, the -`ByteString` containing the integers 65, 66 and 67 (corresponding to -ASCII characters `A`, `B` and `C`). **N.B.** Despite appearances, -these are *binary* data. +A `ByteString` is a sequence of octets. `ByteString`s are compared +lexicographically. ### Symbols. @@ -135,40 +106,27 @@ values called *symbols*. Here, a `Symbol` is, like a `String`, a sequence of Unicode code-points representing an identifier of some kind. `Symbol`s are also compared lexicographically by code-point. -**Examples.** `hello-world`; `utf8-string`; `exact-integer?`. - ### Booleans. -There are exactly two `Boolean` values, “false” and “true”. The -“false” value compares less-than the “true” value. We write `#false` -for “false”, and `#true` for “true”. +There are two `Boolean`s, “false” and “true”. The “false” value is +less-than the “true” value. ### IEEE floating-point values. -A `Float` is a single-precision IEEE 754 floating-point value; a -`Double` is a double-precision IEEE 754 floating-point value. -`Float`s, `Double`s and `SignedInteger`s are considered disjoint, and -so by the rules [above](#total-order), every `Float` is less than -every `Double`, and every `SignedInteger` is greater than both. Two -`Float`s or two `Double`s are to be ordered by the `totalOrder` -predicate defined in section 5.10 of +`Float`s and `Double`s are single- and double-precision IEEE 754 +floating-point values, respectively. `Float`s, `Double`s and +`SignedInteger`s are disjoint; by the rules [above](#total-order), +every `Float` is less than every `Double`, and every `SignedInteger` +is greater than both. Two `Float`s or two `Double`s are to be ordered +by the `totalOrder` predicate defined in section 5.10 of [IEEE Std 754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935). -We write examples using a fractional part and/or an exponent to -distinguish them from `SignedInteger`s. An additional suffix `f` -distinguishes `Float`s from `Double`s. - -**Examples.** 10.0f; -6.0; 0.0f; 0.5; -1.202e300. - -**Non-examples.** 10, -6, and 0, because writing them this way -indicates `SignedInteger`s, not `Float`s or `Double`s. ### Records. -A `Record` is a *labelled* tuple of zero or more `Value`s, called the -record's *fields*. A record's label is itself a `Value`, though it -will usually be a `Symbol`.[^extensibility] [^iri-labels] `Record`s -are compared lexicographically as if they were just tuples; that is, -first by their labels, and then by the remainder of their fields. +A `Record` is a *labelled* tuple of `Value`s, the record's *fields*. A +label can be any `Value`, but is usually a `Symbol`.[^extensibility] +[^iri-labels] `Record`s are compared lexicographically: first by +label, then by field sequence. [^extensibility]: The [Racket](https://racket-lang.org/) programming language defines @@ -186,19 +144,10 @@ first by their labels, and then by the remainder of their fields. it cannot be read as an IRI at all, and so the label simply stands for itself—for its own `Value`. -**Examples.** `foo(1 2 3)`, a `Record` with label `foo` and fields 1, -2 and 3; `void()`, a `Record` with label `void` and no fields. - -**Non-examples.** `()`, because it lacks a label; `void`, because it -lacks even an empty tuple of fields. - ### Sequences. -A `Sequence` is a general-purpose, variable-length ordered sequence of -zero or more `Value`s. `Sequence`s are compared lexicographically. - -**Examples.** `[]`, the empty sequence; `[1 2 3]`, the sequence of -`SignedInteger`s 1, 2 and 3. +A `Sequence` is a sequence of `Value`s. `Sequence`s are compared +lexicographically. ### Sets. @@ -208,40 +157,14 @@ induced by the total order on `Value`s. Two `Set`s are compared by sorting their elements ascending using the [total order](#total-order) and comparing the resulting `Sequence`s. -**Examples.** `#set{}`, the empty set; `#set{#set{}}`, the set -containing only the empty set; `{4 "hello" (void) 9.0f}`, the set -containing 4, the string `"hello"`, the record with label `void` and -no fields, and the `Float` denoting the number 9.0; `{1 1.0f}`, the -set containing a `SignedInteger` and a `Float`; `{mime(application/xml -#"") mime(application/xml #"")}`, a set containing two -different `mime` records.[^mime-xml-difference] - - [^mime-xml-difference]: The two XML documents `` and `` - differ by bytewise comparison, and thus yield different record - values, even though under the semantics of XML they denote - identical XML infoset. - -**Non-examples.** `{1 1}`, because it contains multiple equivalent -`Value`s; `{}`, because without the `#set` marker, it denotes the -empty dictionary. - ### Dictionaries. A `Dictionary` is an unordered finite collection of pairs of `Value`s. -Each pair comprises a *key* and a *value*. Keys in a `Dictionary` must -be pairwise distinct. Instances of `Dictionary` are compared by +Each pair comprises a *key* and a *value*. Keys in a `Dictionary` are +pairwise distinct. Instances of `Dictionary` are compared by lexicographic comparison of the sequences resulting from ordering each `Dictionary`'s pairs in ascending order by key. -**Examples.** `{}`, the empty dictionary; `{a: 1}`, the dictionary -mapping the `Symbol` `a` to the `SignedInteger` 1; `{[1 2 3]: a}`, -mapping `[1 2 3]` to `a`; `{"hi": 0, hi: 0, there: []}`, having a -`String` and two `Symbol` keys, and `SignedInteger` and `Sequence` -values. - -**Non-examples.** `{a:1 b:2 a:3}`, because it contains duplicate -keys; `{[7 8]:[] [7 8]:99}`, for the same reason. - ## Textual Syntax Now we have discussed `Value`s and their meanings, we may turn to @@ -282,7 +205,7 @@ or line feed. ### Grammar -Standalone documents containing textual representations of `Value`s may have trailing whitespace. +Standalone documents may have trailing whitespace. Document = Value ws @@ -301,9 +224,9 @@ the label and the open-parenthesis. `Sequence`s are enclosed in square brackets. `Dictionary` values are curly-brace-enclosed colon-separated pairs of values. `Set`s are -written either as a simple curly-brace-enclosed non-empty sequence of -values, or as a possibly-empty sequence of values enclosed by the -tokens `#set{` and `}`.[^printing-collections] +written either as one or more values enclosed in curly braces, or zero +or more values enclosed by the tokens `#set{` and +`}`.[^printing-collections] Sequence = "[" *Value ws "]" Dictionary = "{" *(Value ws ":" Value) ws "}" @@ -1325,12 +1248,9 @@ into these types. For example, dates and email addresses are often represented as strings with an implicit internal structure. There is no convention for *labelling* a value as belonging to a -particular category. This makes it difficult to extract, say, all -email addresses, or all URLs, from an arbitrary JSON document. - -Instead, JSON-encoded data are often labelled in an ad-hoc way. -Multiple incompatible approaches exist. For example, a "money" -structure containing a `currency` field and an `amount` may be +particular category. Instead, JSON-encoded data are often labelled in +an ad-hoc way. Multiple incompatible approaches exist. For example, a +"money" structure containing a `currency` field and an `amount` may be represented in any number of ways: { "_type": "money", "currency": "EUR", "amount": 10 }