Delete misleading, incorrect, or unnecessary text

This commit is contained in:
Tony Garnock-Jones 2018-11-08 12:35:50 +00:00
parent cf250b9245
commit 10d8ce5c09
1 changed files with 32 additions and 112 deletions

View File

@ -6,7 +6,7 @@
# Preserves: an Expressive Data Language
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
September 2018. Version 0.0.3.
November 2018. Version 0.0.4.
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
[spki]: http://world.std.com/~cme/html/spki.html
@ -17,10 +17,11 @@ September 2018. Version 0.0.3.
This document proposes a data model and serialization format called
*Preserves*.
Preserves supports *records* with user-defined *labels*. This makes it
more expressive[^macro-expressiveness] than most data languages in use
on the web and allows it to easily represent the *labelled sums of
products* as seen in many functional programming languages.
Preserves supports *records* with user-defined *labels*. This relieves
the confusion caused by encoding records as dictionaries, seen in most
data languages in use on the web. It also allows Preserves to easily
represent the *labelled sums of products* as seen in many functional
programming languages.
Preserves also supports the usual suite of atomic and compound data
types, in particular including *binary* data as a distinct type from
@ -30,27 +31,11 @@ Finally, Preserves defines precisely how to *compare* two values.
Comparison is based on the data model, not on syntax or on data
structures of any particular implementation language.
[^macro-expressiveness]: By "expressive" I mean *macro-expressive*
in the sense of Felleisen's 1991 paper, "On the Expressive Power
of Programming Languages".
Roughly speaking, there's no way in a JSON document to introduce a
new kind of information (such as binary data, or a date-stamp, or
a "person" object) in an *unambiguous way* without *global
agreement* from every potential consumer of the document. With an
extensible labelled record type, there is.
Felleisen, Matthias. “On the Expressive Power of Programming
Languages.” Science of Computer Programming 17, no. 1--3 (1991):
3575.
## Starting with Semantics
Taking inspiration from functional programming, we start with a
definition of the *values* that we want to work with and give them
meaning independent of their syntax. When we write examples of values,
we will do so using the [textual syntax](#textual-syntax) defined
later in this document.
meaning independent of their syntax.
Our `Value`s fall into two broad categories: *atomic* and *compound*
data.
@ -98,11 +83,6 @@ neither is less than the other according to the total order.
A `SignedInteger` is a signed integer of arbitrary width.
`SignedInteger`s are compared as mathematical integers.
**Examples.** 10; -6; 0.
**Non-examples.** NaN (the clue is in the name!); ∞ (not finite); 0.2
(not an integer); 1/7 (likewise); 2+*i*3 (likewise); √2 (likewise).
### Unicode strings.
A `String` is a sequence of Unicode
@ -114,19 +94,10 @@ code-point.[^utf8-is-awesome]
gives the same result as a lexicographic byte-by-byte comparison
of the UTF-8 encoding of a string!
**Examples.** `"Hello world"`, an eleven-code-point string; `"z水𝄞"`,
the string containing the three Unicode code-points `z` (0x7A), `水`
(0x6C34) and `𝄞` (0x1D11E); `""`, the empty string.
### Binary data.
A `ByteString` is an ordered sequence of zero or more eight-bit bytes.
`ByteString`s are compared lexicographically.
**Examples.** `#""`, the empty `ByteString`; `#"ABC"`, the
`ByteString` containing the integers 65, 66 and 67 (corresponding to
ASCII characters `A`, `B` and `C`). **N.B.** Despite appearances,
these are *binary* data.
A `ByteString` is a sequence of octets. `ByteString`s are compared
lexicographically.
### Symbols.
@ -135,40 +106,27 @@ values called *symbols*. Here, a `Symbol` is, like a `String`, a
sequence of Unicode code-points representing an identifier of some
kind. `Symbol`s are also compared lexicographically by code-point.
**Examples.** `hello-world`; `utf8-string`; `exact-integer?`.
### Booleans.
There are exactly two `Boolean` values, “false” and “true”. The
“false” value compares less-than the “true” value. We write `#false`
for “false”, and `#true` for “true”.
There are two `Boolean`s, “false” and “true”. The “false” value is
less-than the “true” value.
### IEEE floating-point values.
A `Float` is a single-precision IEEE 754 floating-point value; a
`Double` is a double-precision IEEE 754 floating-point value.
`Float`s, `Double`s and `SignedInteger`s are considered disjoint, and
so by the rules [above](#total-order), every `Float` is less than
every `Double`, and every `SignedInteger` is greater than both. Two
`Float`s or two `Double`s are to be ordered by the `totalOrder`
predicate defined in section 5.10 of
`Float`s and `Double`s are single- and double-precision IEEE 754
floating-point values, respectively. `Float`s, `Double`s and
`SignedInteger`s are disjoint; by the rules [above](#total-order),
every `Float` is less than every `Double`, and every `SignedInteger`
is greater than both. Two `Float`s or two `Double`s are to be ordered
by the `totalOrder` predicate defined in section 5.10 of
[IEEE Std 754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935).
We write examples using a fractional part and/or an exponent to
distinguish them from `SignedInteger`s. An additional suffix `f`
distinguishes `Float`s from `Double`s.
**Examples.** 10.0f; -6.0; 0.0f; 0.5; -1.202e300.
**Non-examples.** 10, -6, and 0, because writing them this way
indicates `SignedInteger`s, not `Float`s or `Double`s.
### Records.
A `Record` is a *labelled* tuple of zero or more `Value`s, called the
record's *fields*. A record's label is itself a `Value`, though it
will usually be a `Symbol`.[^extensibility] [^iri-labels] `Record`s
are compared lexicographically as if they were just tuples; that is,
first by their labels, and then by the remainder of their fields.
A `Record` is a *labelled* tuple of `Value`s, the record's *fields*. A
label can be any `Value`, but is usually a `Symbol`.[^extensibility]
[^iri-labels] `Record`s are compared lexicographically: first by
label, then by field sequence.
[^extensibility]: The [Racket](https://racket-lang.org/) programming
language defines
@ -186,19 +144,10 @@ first by their labels, and then by the remainder of their fields.
it cannot be read as an IRI at all, and so the label simply stands
for itself—for its own `Value`.
**Examples.** `foo(1 2 3)`, a `Record` with label `foo` and fields 1,
2 and 3; `void()`, a `Record` with label `void` and no fields.
**Non-examples.** `()`, because it lacks a label; `void`, because it
lacks even an empty tuple of fields.
### Sequences.
A `Sequence` is a general-purpose, variable-length ordered sequence of
zero or more `Value`s. `Sequence`s are compared lexicographically.
**Examples.** `[]`, the empty sequence; `[1 2 3]`, the sequence of
`SignedInteger`s 1, 2 and 3.
A `Sequence` is a sequence of `Value`s. `Sequence`s are compared
lexicographically.
### Sets.
@ -208,40 +157,14 @@ induced by the total order on `Value`s. Two `Set`s are compared by
sorting their elements ascending using the [total order](#total-order)
and comparing the resulting `Sequence`s.
**Examples.** `#set{}`, the empty set; `#set{#set{}}`, the set
containing only the empty set; `{4 "hello" (void) 9.0f}`, the set
containing 4, the string `"hello"`, the record with label `void` and
no fields, and the `Float` denoting the number 9.0; `{1 1.0f}`, the
set containing a `SignedInteger` and a `Float`; `{mime(application/xml
#"<x/>") mime(application/xml #"<x />")}`, a set containing two
different `mime` records.[^mime-xml-difference]
[^mime-xml-difference]: The two XML documents `<x/>` and `<x />`
differ by bytewise comparison, and thus yield different record
values, even though under the semantics of XML they denote
identical XML infoset.
**Non-examples.** `{1 1}`, because it contains multiple equivalent
`Value`s; `{}`, because without the `#set` marker, it denotes the
empty dictionary.
### Dictionaries.
A `Dictionary` is an unordered finite collection of pairs of `Value`s.
Each pair comprises a *key* and a *value*. Keys in a `Dictionary` must
be pairwise distinct. Instances of `Dictionary` are compared by
Each pair comprises a *key* and a *value*. Keys in a `Dictionary` are
pairwise distinct. Instances of `Dictionary` are compared by
lexicographic comparison of the sequences resulting from ordering each
`Dictionary`'s pairs in ascending order by key.
**Examples.** `{}`, the empty dictionary; `{a: 1}`, the dictionary
mapping the `Symbol` `a` to the `SignedInteger` 1; `{[1 2 3]: a}`,
mapping `[1 2 3]` to `a`; `{"hi": 0, hi: 0, there: []}`, having a
`String` and two `Symbol` keys, and `SignedInteger` and `Sequence`
values.
**Non-examples.** `{a:1 b:2 a:3}`, because it contains duplicate
keys; `{[7 8]:[] [7 8]:99}`, for the same reason.
## Textual Syntax
Now we have discussed `Value`s and their meanings, we may turn to
@ -282,7 +205,7 @@ or line feed.
### Grammar
Standalone documents containing textual representations of `Value`s may have trailing whitespace.
Standalone documents may have trailing whitespace.
Document = Value ws
@ -301,9 +224,9 @@ the label and the open-parenthesis.
`Sequence`s are enclosed in square brackets. `Dictionary` values are
curly-brace-enclosed colon-separated pairs of values. `Set`s are
written either as a simple curly-brace-enclosed non-empty sequence of
values, or as a possibly-empty sequence of values enclosed by the
tokens `#set{` and `}`.[^printing-collections]
written either as one or more values enclosed in curly braces, or zero
or more values enclosed by the tokens `#set{` and
`}`.[^printing-collections]
Sequence = "[" *Value ws "]"
Dictionary = "{" *(Value ws ":" Value) ws "}"
@ -1325,12 +1248,9 @@ into these types. For example, dates and email addresses are often
represented as strings with an implicit internal structure.
There is no convention for *labelling* a value as belonging to a
particular category. This makes it difficult to extract, say, all
email addresses, or all URLs, from an arbitrary JSON document.
Instead, JSON-encoded data are often labelled in an ad-hoc way.
Multiple incompatible approaches exist. For example, a "money"
structure containing a `currency` field and an `amount` may be
particular category. Instead, JSON-encoded data are often labelled in
an ad-hoc way. Multiple incompatible approaches exist. For example, a
"money" structure containing a `currency` field and an `amount` may be
represented in any number of ways:
{ "_type": "money", "currency": "EUR", "amount": 10 }