Delete misleading, incorrect, or unnecessary text
This commit is contained in:
parent
cf250b9245
commit
10d8ce5c09
144
preserves.md
144
preserves.md
|
@ -6,7 +6,7 @@
|
|||
# Preserves: an Expressive Data Language
|
||||
|
||||
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
||||
September 2018. Version 0.0.3.
|
||||
November 2018. Version 0.0.4.
|
||||
|
||||
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
|
||||
[spki]: http://world.std.com/~cme/html/spki.html
|
||||
|
@ -17,10 +17,11 @@ September 2018. Version 0.0.3.
|
|||
This document proposes a data model and serialization format called
|
||||
*Preserves*.
|
||||
|
||||
Preserves supports *records* with user-defined *labels*. This makes it
|
||||
more expressive[^macro-expressiveness] than most data languages in use
|
||||
on the web and allows it to easily represent the *labelled sums of
|
||||
products* as seen in many functional programming languages.
|
||||
Preserves supports *records* with user-defined *labels*. This relieves
|
||||
the confusion caused by encoding records as dictionaries, seen in most
|
||||
data languages in use on the web. It also allows Preserves to easily
|
||||
represent the *labelled sums of products* as seen in many functional
|
||||
programming languages.
|
||||
|
||||
Preserves also supports the usual suite of atomic and compound data
|
||||
types, in particular including *binary* data as a distinct type from
|
||||
|
@ -30,27 +31,11 @@ Finally, Preserves defines precisely how to *compare* two values.
|
|||
Comparison is based on the data model, not on syntax or on data
|
||||
structures of any particular implementation language.
|
||||
|
||||
[^macro-expressiveness]: By "expressive" I mean *macro-expressive*
|
||||
in the sense of Felleisen's 1991 paper, "On the Expressive Power
|
||||
of Programming Languages".
|
||||
|
||||
Roughly speaking, there's no way in a JSON document to introduce a
|
||||
new kind of information (such as binary data, or a date-stamp, or
|
||||
a "person" object) in an *unambiguous way* without *global
|
||||
agreement* from every potential consumer of the document. With an
|
||||
extensible labelled record type, there is.
|
||||
|
||||
Felleisen, Matthias. “On the Expressive Power of Programming
|
||||
Languages.” Science of Computer Programming 17, no. 1--3 (1991):
|
||||
35–75.
|
||||
|
||||
## Starting with Semantics
|
||||
|
||||
Taking inspiration from functional programming, we start with a
|
||||
definition of the *values* that we want to work with and give them
|
||||
meaning independent of their syntax. When we write examples of values,
|
||||
we will do so using the [textual syntax](#textual-syntax) defined
|
||||
later in this document.
|
||||
meaning independent of their syntax.
|
||||
|
||||
Our `Value`s fall into two broad categories: *atomic* and *compound*
|
||||
data.
|
||||
|
@ -98,11 +83,6 @@ neither is less than the other according to the total order.
|
|||
A `SignedInteger` is a signed integer of arbitrary width.
|
||||
`SignedInteger`s are compared as mathematical integers.
|
||||
|
||||
**Examples.** 10; -6; 0.
|
||||
|
||||
**Non-examples.** NaN (the clue is in the name!); ∞ (not finite); 0.2
|
||||
(not an integer); 1/7 (likewise); 2+*i*3 (likewise); √2 (likewise).
|
||||
|
||||
### Unicode strings.
|
||||
|
||||
A `String` is a sequence of Unicode
|
||||
|
@ -114,19 +94,10 @@ code-point.[^utf8-is-awesome]
|
|||
gives the same result as a lexicographic byte-by-byte comparison
|
||||
of the UTF-8 encoding of a string!
|
||||
|
||||
**Examples.** `"Hello world"`, an eleven-code-point string; `"z水𝄞"`,
|
||||
the string containing the three Unicode code-points `z` (0x7A), `水`
|
||||
(0x6C34) and `𝄞` (0x1D11E); `""`, the empty string.
|
||||
|
||||
### Binary data.
|
||||
|
||||
A `ByteString` is an ordered sequence of zero or more eight-bit bytes.
|
||||
`ByteString`s are compared lexicographically.
|
||||
|
||||
**Examples.** `#""`, the empty `ByteString`; `#"ABC"`, the
|
||||
`ByteString` containing the integers 65, 66 and 67 (corresponding to
|
||||
ASCII characters `A`, `B` and `C`). **N.B.** Despite appearances,
|
||||
these are *binary* data.
|
||||
A `ByteString` is a sequence of octets. `ByteString`s are compared
|
||||
lexicographically.
|
||||
|
||||
### Symbols.
|
||||
|
||||
|
@ -135,40 +106,27 @@ values called *symbols*. Here, a `Symbol` is, like a `String`, a
|
|||
sequence of Unicode code-points representing an identifier of some
|
||||
kind. `Symbol`s are also compared lexicographically by code-point.
|
||||
|
||||
**Examples.** `hello-world`; `utf8-string`; `exact-integer?`.
|
||||
|
||||
### Booleans.
|
||||
|
||||
There are exactly two `Boolean` values, “false” and “true”. The
|
||||
“false” value compares less-than the “true” value. We write `#false`
|
||||
for “false”, and `#true` for “true”.
|
||||
There are two `Boolean`s, “false” and “true”. The “false” value is
|
||||
less-than the “true” value.
|
||||
|
||||
### IEEE floating-point values.
|
||||
|
||||
A `Float` is a single-precision IEEE 754 floating-point value; a
|
||||
`Double` is a double-precision IEEE 754 floating-point value.
|
||||
`Float`s, `Double`s and `SignedInteger`s are considered disjoint, and
|
||||
so by the rules [above](#total-order), every `Float` is less than
|
||||
every `Double`, and every `SignedInteger` is greater than both. Two
|
||||
`Float`s or two `Double`s are to be ordered by the `totalOrder`
|
||||
predicate defined in section 5.10 of
|
||||
`Float`s and `Double`s are single- and double-precision IEEE 754
|
||||
floating-point values, respectively. `Float`s, `Double`s and
|
||||
`SignedInteger`s are disjoint; by the rules [above](#total-order),
|
||||
every `Float` is less than every `Double`, and every `SignedInteger`
|
||||
is greater than both. Two `Float`s or two `Double`s are to be ordered
|
||||
by the `totalOrder` predicate defined in section 5.10 of
|
||||
[IEEE Std 754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935).
|
||||
We write examples using a fractional part and/or an exponent to
|
||||
distinguish them from `SignedInteger`s. An additional suffix `f`
|
||||
distinguishes `Float`s from `Double`s.
|
||||
|
||||
**Examples.** 10.0f; -6.0; 0.0f; 0.5; -1.202e300.
|
||||
|
||||
**Non-examples.** 10, -6, and 0, because writing them this way
|
||||
indicates `SignedInteger`s, not `Float`s or `Double`s.
|
||||
|
||||
### Records.
|
||||
|
||||
A `Record` is a *labelled* tuple of zero or more `Value`s, called the
|
||||
record's *fields*. A record's label is itself a `Value`, though it
|
||||
will usually be a `Symbol`.[^extensibility] [^iri-labels] `Record`s
|
||||
are compared lexicographically as if they were just tuples; that is,
|
||||
first by their labels, and then by the remainder of their fields.
|
||||
A `Record` is a *labelled* tuple of `Value`s, the record's *fields*. A
|
||||
label can be any `Value`, but is usually a `Symbol`.[^extensibility]
|
||||
[^iri-labels] `Record`s are compared lexicographically: first by
|
||||
label, then by field sequence.
|
||||
|
||||
[^extensibility]: The [Racket](https://racket-lang.org/) programming
|
||||
language defines
|
||||
|
@ -186,19 +144,10 @@ first by their labels, and then by the remainder of their fields.
|
|||
it cannot be read as an IRI at all, and so the label simply stands
|
||||
for itself—for its own `Value`.
|
||||
|
||||
**Examples.** `foo(1 2 3)`, a `Record` with label `foo` and fields 1,
|
||||
2 and 3; `void()`, a `Record` with label `void` and no fields.
|
||||
|
||||
**Non-examples.** `()`, because it lacks a label; `void`, because it
|
||||
lacks even an empty tuple of fields.
|
||||
|
||||
### Sequences.
|
||||
|
||||
A `Sequence` is a general-purpose, variable-length ordered sequence of
|
||||
zero or more `Value`s. `Sequence`s are compared lexicographically.
|
||||
|
||||
**Examples.** `[]`, the empty sequence; `[1 2 3]`, the sequence of
|
||||
`SignedInteger`s 1, 2 and 3.
|
||||
A `Sequence` is a sequence of `Value`s. `Sequence`s are compared
|
||||
lexicographically.
|
||||
|
||||
### Sets.
|
||||
|
||||
|
@ -208,40 +157,14 @@ induced by the total order on `Value`s. Two `Set`s are compared by
|
|||
sorting their elements ascending using the [total order](#total-order)
|
||||
and comparing the resulting `Sequence`s.
|
||||
|
||||
**Examples.** `#set{}`, the empty set; `#set{#set{}}`, the set
|
||||
containing only the empty set; `{4 "hello" (void) 9.0f}`, the set
|
||||
containing 4, the string `"hello"`, the record with label `void` and
|
||||
no fields, and the `Float` denoting the number 9.0; `{1 1.0f}`, the
|
||||
set containing a `SignedInteger` and a `Float`; `{mime(application/xml
|
||||
#"<x/>") mime(application/xml #"<x />")}`, a set containing two
|
||||
different `mime` records.[^mime-xml-difference]
|
||||
|
||||
[^mime-xml-difference]: The two XML documents `<x/>` and `<x />`
|
||||
differ by bytewise comparison, and thus yield different record
|
||||
values, even though under the semantics of XML they denote
|
||||
identical XML infoset.
|
||||
|
||||
**Non-examples.** `{1 1}`, because it contains multiple equivalent
|
||||
`Value`s; `{}`, because without the `#set` marker, it denotes the
|
||||
empty dictionary.
|
||||
|
||||
### Dictionaries.
|
||||
|
||||
A `Dictionary` is an unordered finite collection of pairs of `Value`s.
|
||||
Each pair comprises a *key* and a *value*. Keys in a `Dictionary` must
|
||||
be pairwise distinct. Instances of `Dictionary` are compared by
|
||||
Each pair comprises a *key* and a *value*. Keys in a `Dictionary` are
|
||||
pairwise distinct. Instances of `Dictionary` are compared by
|
||||
lexicographic comparison of the sequences resulting from ordering each
|
||||
`Dictionary`'s pairs in ascending order by key.
|
||||
|
||||
**Examples.** `{}`, the empty dictionary; `{a: 1}`, the dictionary
|
||||
mapping the `Symbol` `a` to the `SignedInteger` 1; `{[1 2 3]: a}`,
|
||||
mapping `[1 2 3]` to `a`; `{"hi": 0, hi: 0, there: []}`, having a
|
||||
`String` and two `Symbol` keys, and `SignedInteger` and `Sequence`
|
||||
values.
|
||||
|
||||
**Non-examples.** `{a:1 b:2 a:3}`, because it contains duplicate
|
||||
keys; `{[7 8]:[] [7 8]:99}`, for the same reason.
|
||||
|
||||
## Textual Syntax
|
||||
|
||||
Now we have discussed `Value`s and their meanings, we may turn to
|
||||
|
@ -282,7 +205,7 @@ or line feed.
|
|||
|
||||
### Grammar
|
||||
|
||||
Standalone documents containing textual representations of `Value`s may have trailing whitespace.
|
||||
Standalone documents may have trailing whitespace.
|
||||
|
||||
Document = Value ws
|
||||
|
||||
|
@ -301,9 +224,9 @@ the label and the open-parenthesis.
|
|||
|
||||
`Sequence`s are enclosed in square brackets. `Dictionary` values are
|
||||
curly-brace-enclosed colon-separated pairs of values. `Set`s are
|
||||
written either as a simple curly-brace-enclosed non-empty sequence of
|
||||
values, or as a possibly-empty sequence of values enclosed by the
|
||||
tokens `#set{` and `}`.[^printing-collections]
|
||||
written either as one or more values enclosed in curly braces, or zero
|
||||
or more values enclosed by the tokens `#set{` and
|
||||
`}`.[^printing-collections]
|
||||
|
||||
Sequence = "[" *Value ws "]"
|
||||
Dictionary = "{" *(Value ws ":" Value) ws "}"
|
||||
|
@ -1325,12 +1248,9 @@ into these types. For example, dates and email addresses are often
|
|||
represented as strings with an implicit internal structure.
|
||||
|
||||
There is no convention for *labelling* a value as belonging to a
|
||||
particular category. This makes it difficult to extract, say, all
|
||||
email addresses, or all URLs, from an arbitrary JSON document.
|
||||
|
||||
Instead, JSON-encoded data are often labelled in an ad-hoc way.
|
||||
Multiple incompatible approaches exist. For example, a "money"
|
||||
structure containing a `currency` field and an `amount` may be
|
||||
particular category. Instead, JSON-encoded data are often labelled in
|
||||
an ad-hoc way. Multiple incompatible approaches exist. For example, a
|
||||
"money" structure containing a `currency` field and an `amount` may be
|
||||
represented in any number of ways:
|
||||
|
||||
{ "_type": "money", "currency": "EUR", "amount": 10 }
|
||||
|
|
Loading…
Reference in New Issue