Delete misleading, incorrect, or unnecessary text

This commit is contained in:
Tony Garnock-Jones 2018-11-08 12:35:50 +00:00
parent cf250b9245
commit 10d8ce5c09
1 changed files with 32 additions and 112 deletions

View File

@ -6,7 +6,7 @@
# Preserves: an Expressive Data Language # Preserves: an Expressive Data Language
Tony Garnock-Jones <tonyg@leastfixedpoint.com> Tony Garnock-Jones <tonyg@leastfixedpoint.com>
September 2018. Version 0.0.3. November 2018. Version 0.0.4.
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt [sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
[spki]: http://world.std.com/~cme/html/spki.html [spki]: http://world.std.com/~cme/html/spki.html
@ -17,10 +17,11 @@ September 2018. Version 0.0.3.
This document proposes a data model and serialization format called This document proposes a data model and serialization format called
*Preserves*. *Preserves*.
Preserves supports *records* with user-defined *labels*. This makes it Preserves supports *records* with user-defined *labels*. This relieves
more expressive[^macro-expressiveness] than most data languages in use the confusion caused by encoding records as dictionaries, seen in most
on the web and allows it to easily represent the *labelled sums of data languages in use on the web. It also allows Preserves to easily
products* as seen in many functional programming languages. represent the *labelled sums of products* as seen in many functional
programming languages.
Preserves also supports the usual suite of atomic and compound data Preserves also supports the usual suite of atomic and compound data
types, in particular including *binary* data as a distinct type from types, in particular including *binary* data as a distinct type from
@ -30,27 +31,11 @@ Finally, Preserves defines precisely how to *compare* two values.
Comparison is based on the data model, not on syntax or on data Comparison is based on the data model, not on syntax or on data
structures of any particular implementation language. structures of any particular implementation language.
[^macro-expressiveness]: By "expressive" I mean *macro-expressive*
in the sense of Felleisen's 1991 paper, "On the Expressive Power
of Programming Languages".
Roughly speaking, there's no way in a JSON document to introduce a
new kind of information (such as binary data, or a date-stamp, or
a "person" object) in an *unambiguous way* without *global
agreement* from every potential consumer of the document. With an
extensible labelled record type, there is.
Felleisen, Matthias. “On the Expressive Power of Programming
Languages.” Science of Computer Programming 17, no. 1--3 (1991):
3575.
## Starting with Semantics ## Starting with Semantics
Taking inspiration from functional programming, we start with a Taking inspiration from functional programming, we start with a
definition of the *values* that we want to work with and give them definition of the *values* that we want to work with and give them
meaning independent of their syntax. When we write examples of values, meaning independent of their syntax.
we will do so using the [textual syntax](#textual-syntax) defined
later in this document.
Our `Value`s fall into two broad categories: *atomic* and *compound* Our `Value`s fall into two broad categories: *atomic* and *compound*
data. data.
@ -98,11 +83,6 @@ neither is less than the other according to the total order.
A `SignedInteger` is a signed integer of arbitrary width. A `SignedInteger` is a signed integer of arbitrary width.
`SignedInteger`s are compared as mathematical integers. `SignedInteger`s are compared as mathematical integers.
**Examples.** 10; -6; 0.
**Non-examples.** NaN (the clue is in the name!); ∞ (not finite); 0.2
(not an integer); 1/7 (likewise); 2+*i*3 (likewise); √2 (likewise).
### Unicode strings. ### Unicode strings.
A `String` is a sequence of Unicode A `String` is a sequence of Unicode
@ -114,19 +94,10 @@ code-point.[^utf8-is-awesome]
gives the same result as a lexicographic byte-by-byte comparison gives the same result as a lexicographic byte-by-byte comparison
of the UTF-8 encoding of a string! of the UTF-8 encoding of a string!
**Examples.** `"Hello world"`, an eleven-code-point string; `"z水𝄞"`,
the string containing the three Unicode code-points `z` (0x7A), `水`
(0x6C34) and `𝄞` (0x1D11E); `""`, the empty string.
### Binary data. ### Binary data.
A `ByteString` is an ordered sequence of zero or more eight-bit bytes. A `ByteString` is a sequence of octets. `ByteString`s are compared
`ByteString`s are compared lexicographically. lexicographically.
**Examples.** `#""`, the empty `ByteString`; `#"ABC"`, the
`ByteString` containing the integers 65, 66 and 67 (corresponding to
ASCII characters `A`, `B` and `C`). **N.B.** Despite appearances,
these are *binary* data.
### Symbols. ### Symbols.
@ -135,40 +106,27 @@ values called *symbols*. Here, a `Symbol` is, like a `String`, a
sequence of Unicode code-points representing an identifier of some sequence of Unicode code-points representing an identifier of some
kind. `Symbol`s are also compared lexicographically by code-point. kind. `Symbol`s are also compared lexicographically by code-point.
**Examples.** `hello-world`; `utf8-string`; `exact-integer?`.
### Booleans. ### Booleans.
There are exactly two `Boolean` values, “false” and “true”. The There are two `Boolean`s, “false” and “true”. The “false” value is
“false” value compares less-than the “true” value. We write `#false` less-than the “true” value.
for “false”, and `#true` for “true”.
### IEEE floating-point values. ### IEEE floating-point values.
A `Float` is a single-precision IEEE 754 floating-point value; a `Float`s and `Double`s are single- and double-precision IEEE 754
`Double` is a double-precision IEEE 754 floating-point value. floating-point values, respectively. `Float`s, `Double`s and
`Float`s, `Double`s and `SignedInteger`s are considered disjoint, and `SignedInteger`s are disjoint; by the rules [above](#total-order),
so by the rules [above](#total-order), every `Float` is less than every `Float` is less than every `Double`, and every `SignedInteger`
every `Double`, and every `SignedInteger` is greater than both. Two is greater than both. Two `Float`s or two `Double`s are to be ordered
`Float`s or two `Double`s are to be ordered by the `totalOrder` by the `totalOrder` predicate defined in section 5.10 of
predicate defined in section 5.10 of
[IEEE Std 754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935). [IEEE Std 754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935).
We write examples using a fractional part and/or an exponent to
distinguish them from `SignedInteger`s. An additional suffix `f`
distinguishes `Float`s from `Double`s.
**Examples.** 10.0f; -6.0; 0.0f; 0.5; -1.202e300.
**Non-examples.** 10, -6, and 0, because writing them this way
indicates `SignedInteger`s, not `Float`s or `Double`s.
### Records. ### Records.
A `Record` is a *labelled* tuple of zero or more `Value`s, called the A `Record` is a *labelled* tuple of `Value`s, the record's *fields*. A
record's *fields*. A record's label is itself a `Value`, though it label can be any `Value`, but is usually a `Symbol`.[^extensibility]
will usually be a `Symbol`.[^extensibility] [^iri-labels] `Record`s [^iri-labels] `Record`s are compared lexicographically: first by
are compared lexicographically as if they were just tuples; that is, label, then by field sequence.
first by their labels, and then by the remainder of their fields.
[^extensibility]: The [Racket](https://racket-lang.org/) programming [^extensibility]: The [Racket](https://racket-lang.org/) programming
language defines language defines
@ -186,19 +144,10 @@ first by their labels, and then by the remainder of their fields.
it cannot be read as an IRI at all, and so the label simply stands it cannot be read as an IRI at all, and so the label simply stands
for itself—for its own `Value`. for itself—for its own `Value`.
**Examples.** `foo(1 2 3)`, a `Record` with label `foo` and fields 1,
2 and 3; `void()`, a `Record` with label `void` and no fields.
**Non-examples.** `()`, because it lacks a label; `void`, because it
lacks even an empty tuple of fields.
### Sequences. ### Sequences.
A `Sequence` is a general-purpose, variable-length ordered sequence of A `Sequence` is a sequence of `Value`s. `Sequence`s are compared
zero or more `Value`s. `Sequence`s are compared lexicographically. lexicographically.
**Examples.** `[]`, the empty sequence; `[1 2 3]`, the sequence of
`SignedInteger`s 1, 2 and 3.
### Sets. ### Sets.
@ -208,40 +157,14 @@ induced by the total order on `Value`s. Two `Set`s are compared by
sorting their elements ascending using the [total order](#total-order) sorting their elements ascending using the [total order](#total-order)
and comparing the resulting `Sequence`s. and comparing the resulting `Sequence`s.
**Examples.** `#set{}`, the empty set; `#set{#set{}}`, the set
containing only the empty set; `{4 "hello" (void) 9.0f}`, the set
containing 4, the string `"hello"`, the record with label `void` and
no fields, and the `Float` denoting the number 9.0; `{1 1.0f}`, the
set containing a `SignedInteger` and a `Float`; `{mime(application/xml
#"<x/>") mime(application/xml #"<x />")}`, a set containing two
different `mime` records.[^mime-xml-difference]
[^mime-xml-difference]: The two XML documents `<x/>` and `<x />`
differ by bytewise comparison, and thus yield different record
values, even though under the semantics of XML they denote
identical XML infoset.
**Non-examples.** `{1 1}`, because it contains multiple equivalent
`Value`s; `{}`, because without the `#set` marker, it denotes the
empty dictionary.
### Dictionaries. ### Dictionaries.
A `Dictionary` is an unordered finite collection of pairs of `Value`s. A `Dictionary` is an unordered finite collection of pairs of `Value`s.
Each pair comprises a *key* and a *value*. Keys in a `Dictionary` must Each pair comprises a *key* and a *value*. Keys in a `Dictionary` are
be pairwise distinct. Instances of `Dictionary` are compared by pairwise distinct. Instances of `Dictionary` are compared by
lexicographic comparison of the sequences resulting from ordering each lexicographic comparison of the sequences resulting from ordering each
`Dictionary`'s pairs in ascending order by key. `Dictionary`'s pairs in ascending order by key.
**Examples.** `{}`, the empty dictionary; `{a: 1}`, the dictionary
mapping the `Symbol` `a` to the `SignedInteger` 1; `{[1 2 3]: a}`,
mapping `[1 2 3]` to `a`; `{"hi": 0, hi: 0, there: []}`, having a
`String` and two `Symbol` keys, and `SignedInteger` and `Sequence`
values.
**Non-examples.** `{a:1 b:2 a:3}`, because it contains duplicate
keys; `{[7 8]:[] [7 8]:99}`, for the same reason.
## Textual Syntax ## Textual Syntax
Now we have discussed `Value`s and their meanings, we may turn to Now we have discussed `Value`s and their meanings, we may turn to
@ -282,7 +205,7 @@ or line feed.
### Grammar ### Grammar
Standalone documents containing textual representations of `Value`s may have trailing whitespace. Standalone documents may have trailing whitespace.
Document = Value ws Document = Value ws
@ -301,9 +224,9 @@ the label and the open-parenthesis.
`Sequence`s are enclosed in square brackets. `Dictionary` values are `Sequence`s are enclosed in square brackets. `Dictionary` values are
curly-brace-enclosed colon-separated pairs of values. `Set`s are curly-brace-enclosed colon-separated pairs of values. `Set`s are
written either as a simple curly-brace-enclosed non-empty sequence of written either as one or more values enclosed in curly braces, or zero
values, or as a possibly-empty sequence of values enclosed by the or more values enclosed by the tokens `#set{` and
tokens `#set{` and `}`.[^printing-collections] `}`.[^printing-collections]
Sequence = "[" *Value ws "]" Sequence = "[" *Value ws "]"
Dictionary = "{" *(Value ws ":" Value) ws "}" Dictionary = "{" *(Value ws ":" Value) ws "}"
@ -1325,12 +1248,9 @@ into these types. For example, dates and email addresses are often
represented as strings with an implicit internal structure. represented as strings with an implicit internal structure.
There is no convention for *labelling* a value as belonging to a There is no convention for *labelling* a value as belonging to a
particular category. This makes it difficult to extract, say, all particular category. Instead, JSON-encoded data are often labelled in
email addresses, or all URLs, from an arbitrary JSON document. an ad-hoc way. Multiple incompatible approaches exist. For example, a
"money" structure containing a `currency` field and an `amount` may be
Instead, JSON-encoded data are often labelled in an ad-hoc way.
Multiple incompatible approaches exist. For example, a "money"
structure containing a `currency` field and an `amount` may be
represented in any number of ways: represented in any number of ways:
{ "_type": "money", "currency": "EUR", "amount": 10 } { "_type": "money", "currency": "EUR", "amount": 10 }