forked from syndicate-lang/preserves
Delete misleading, incorrect, or unnecessary text
This commit is contained in:
parent
cf250b9245
commit
10d8ce5c09
144
preserves.md
144
preserves.md
|
@ -6,7 +6,7 @@
|
||||||
# Preserves: an Expressive Data Language
|
# Preserves: an Expressive Data Language
|
||||||
|
|
||||||
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
||||||
September 2018. Version 0.0.3.
|
November 2018. Version 0.0.4.
|
||||||
|
|
||||||
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
|
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
|
||||||
[spki]: http://world.std.com/~cme/html/spki.html
|
[spki]: http://world.std.com/~cme/html/spki.html
|
||||||
|
@ -17,10 +17,11 @@ September 2018. Version 0.0.3.
|
||||||
This document proposes a data model and serialization format called
|
This document proposes a data model and serialization format called
|
||||||
*Preserves*.
|
*Preserves*.
|
||||||
|
|
||||||
Preserves supports *records* with user-defined *labels*. This makes it
|
Preserves supports *records* with user-defined *labels*. This relieves
|
||||||
more expressive[^macro-expressiveness] than most data languages in use
|
the confusion caused by encoding records as dictionaries, seen in most
|
||||||
on the web and allows it to easily represent the *labelled sums of
|
data languages in use on the web. It also allows Preserves to easily
|
||||||
products* as seen in many functional programming languages.
|
represent the *labelled sums of products* as seen in many functional
|
||||||
|
programming languages.
|
||||||
|
|
||||||
Preserves also supports the usual suite of atomic and compound data
|
Preserves also supports the usual suite of atomic and compound data
|
||||||
types, in particular including *binary* data as a distinct type from
|
types, in particular including *binary* data as a distinct type from
|
||||||
|
@ -30,27 +31,11 @@ Finally, Preserves defines precisely how to *compare* two values.
|
||||||
Comparison is based on the data model, not on syntax or on data
|
Comparison is based on the data model, not on syntax or on data
|
||||||
structures of any particular implementation language.
|
structures of any particular implementation language.
|
||||||
|
|
||||||
[^macro-expressiveness]: By "expressive" I mean *macro-expressive*
|
|
||||||
in the sense of Felleisen's 1991 paper, "On the Expressive Power
|
|
||||||
of Programming Languages".
|
|
||||||
|
|
||||||
Roughly speaking, there's no way in a JSON document to introduce a
|
|
||||||
new kind of information (such as binary data, or a date-stamp, or
|
|
||||||
a "person" object) in an *unambiguous way* without *global
|
|
||||||
agreement* from every potential consumer of the document. With an
|
|
||||||
extensible labelled record type, there is.
|
|
||||||
|
|
||||||
Felleisen, Matthias. “On the Expressive Power of Programming
|
|
||||||
Languages.” Science of Computer Programming 17, no. 1--3 (1991):
|
|
||||||
35–75.
|
|
||||||
|
|
||||||
## Starting with Semantics
|
## Starting with Semantics
|
||||||
|
|
||||||
Taking inspiration from functional programming, we start with a
|
Taking inspiration from functional programming, we start with a
|
||||||
definition of the *values* that we want to work with and give them
|
definition of the *values* that we want to work with and give them
|
||||||
meaning independent of their syntax. When we write examples of values,
|
meaning independent of their syntax.
|
||||||
we will do so using the [textual syntax](#textual-syntax) defined
|
|
||||||
later in this document.
|
|
||||||
|
|
||||||
Our `Value`s fall into two broad categories: *atomic* and *compound*
|
Our `Value`s fall into two broad categories: *atomic* and *compound*
|
||||||
data.
|
data.
|
||||||
|
@ -98,11 +83,6 @@ neither is less than the other according to the total order.
|
||||||
A `SignedInteger` is a signed integer of arbitrary width.
|
A `SignedInteger` is a signed integer of arbitrary width.
|
||||||
`SignedInteger`s are compared as mathematical integers.
|
`SignedInteger`s are compared as mathematical integers.
|
||||||
|
|
||||||
**Examples.** 10; -6; 0.
|
|
||||||
|
|
||||||
**Non-examples.** NaN (the clue is in the name!); ∞ (not finite); 0.2
|
|
||||||
(not an integer); 1/7 (likewise); 2+*i*3 (likewise); √2 (likewise).
|
|
||||||
|
|
||||||
### Unicode strings.
|
### Unicode strings.
|
||||||
|
|
||||||
A `String` is a sequence of Unicode
|
A `String` is a sequence of Unicode
|
||||||
|
@ -114,19 +94,10 @@ code-point.[^utf8-is-awesome]
|
||||||
gives the same result as a lexicographic byte-by-byte comparison
|
gives the same result as a lexicographic byte-by-byte comparison
|
||||||
of the UTF-8 encoding of a string!
|
of the UTF-8 encoding of a string!
|
||||||
|
|
||||||
**Examples.** `"Hello world"`, an eleven-code-point string; `"z水𝄞"`,
|
|
||||||
the string containing the three Unicode code-points `z` (0x7A), `水`
|
|
||||||
(0x6C34) and `𝄞` (0x1D11E); `""`, the empty string.
|
|
||||||
|
|
||||||
### Binary data.
|
### Binary data.
|
||||||
|
|
||||||
A `ByteString` is an ordered sequence of zero or more eight-bit bytes.
|
A `ByteString` is a sequence of octets. `ByteString`s are compared
|
||||||
`ByteString`s are compared lexicographically.
|
lexicographically.
|
||||||
|
|
||||||
**Examples.** `#""`, the empty `ByteString`; `#"ABC"`, the
|
|
||||||
`ByteString` containing the integers 65, 66 and 67 (corresponding to
|
|
||||||
ASCII characters `A`, `B` and `C`). **N.B.** Despite appearances,
|
|
||||||
these are *binary* data.
|
|
||||||
|
|
||||||
### Symbols.
|
### Symbols.
|
||||||
|
|
||||||
|
@ -135,40 +106,27 @@ values called *symbols*. Here, a `Symbol` is, like a `String`, a
|
||||||
sequence of Unicode code-points representing an identifier of some
|
sequence of Unicode code-points representing an identifier of some
|
||||||
kind. `Symbol`s are also compared lexicographically by code-point.
|
kind. `Symbol`s are also compared lexicographically by code-point.
|
||||||
|
|
||||||
**Examples.** `hello-world`; `utf8-string`; `exact-integer?`.
|
|
||||||
|
|
||||||
### Booleans.
|
### Booleans.
|
||||||
|
|
||||||
There are exactly two `Boolean` values, “false” and “true”. The
|
There are two `Boolean`s, “false” and “true”. The “false” value is
|
||||||
“false” value compares less-than the “true” value. We write `#false`
|
less-than the “true” value.
|
||||||
for “false”, and `#true` for “true”.
|
|
||||||
|
|
||||||
### IEEE floating-point values.
|
### IEEE floating-point values.
|
||||||
|
|
||||||
A `Float` is a single-precision IEEE 754 floating-point value; a
|
`Float`s and `Double`s are single- and double-precision IEEE 754
|
||||||
`Double` is a double-precision IEEE 754 floating-point value.
|
floating-point values, respectively. `Float`s, `Double`s and
|
||||||
`Float`s, `Double`s and `SignedInteger`s are considered disjoint, and
|
`SignedInteger`s are disjoint; by the rules [above](#total-order),
|
||||||
so by the rules [above](#total-order), every `Float` is less than
|
every `Float` is less than every `Double`, and every `SignedInteger`
|
||||||
every `Double`, and every `SignedInteger` is greater than both. Two
|
is greater than both. Two `Float`s or two `Double`s are to be ordered
|
||||||
`Float`s or two `Double`s are to be ordered by the `totalOrder`
|
by the `totalOrder` predicate defined in section 5.10 of
|
||||||
predicate defined in section 5.10 of
|
|
||||||
[IEEE Std 754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935).
|
[IEEE Std 754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935).
|
||||||
We write examples using a fractional part and/or an exponent to
|
|
||||||
distinguish them from `SignedInteger`s. An additional suffix `f`
|
|
||||||
distinguishes `Float`s from `Double`s.
|
|
||||||
|
|
||||||
**Examples.** 10.0f; -6.0; 0.0f; 0.5; -1.202e300.
|
|
||||||
|
|
||||||
**Non-examples.** 10, -6, and 0, because writing them this way
|
|
||||||
indicates `SignedInteger`s, not `Float`s or `Double`s.
|
|
||||||
|
|
||||||
### Records.
|
### Records.
|
||||||
|
|
||||||
A `Record` is a *labelled* tuple of zero or more `Value`s, called the
|
A `Record` is a *labelled* tuple of `Value`s, the record's *fields*. A
|
||||||
record's *fields*. A record's label is itself a `Value`, though it
|
label can be any `Value`, but is usually a `Symbol`.[^extensibility]
|
||||||
will usually be a `Symbol`.[^extensibility] [^iri-labels] `Record`s
|
[^iri-labels] `Record`s are compared lexicographically: first by
|
||||||
are compared lexicographically as if they were just tuples; that is,
|
label, then by field sequence.
|
||||||
first by their labels, and then by the remainder of their fields.
|
|
||||||
|
|
||||||
[^extensibility]: The [Racket](https://racket-lang.org/) programming
|
[^extensibility]: The [Racket](https://racket-lang.org/) programming
|
||||||
language defines
|
language defines
|
||||||
|
@ -186,19 +144,10 @@ first by their labels, and then by the remainder of their fields.
|
||||||
it cannot be read as an IRI at all, and so the label simply stands
|
it cannot be read as an IRI at all, and so the label simply stands
|
||||||
for itself—for its own `Value`.
|
for itself—for its own `Value`.
|
||||||
|
|
||||||
**Examples.** `foo(1 2 3)`, a `Record` with label `foo` and fields 1,
|
|
||||||
2 and 3; `void()`, a `Record` with label `void` and no fields.
|
|
||||||
|
|
||||||
**Non-examples.** `()`, because it lacks a label; `void`, because it
|
|
||||||
lacks even an empty tuple of fields.
|
|
||||||
|
|
||||||
### Sequences.
|
### Sequences.
|
||||||
|
|
||||||
A `Sequence` is a general-purpose, variable-length ordered sequence of
|
A `Sequence` is a sequence of `Value`s. `Sequence`s are compared
|
||||||
zero or more `Value`s. `Sequence`s are compared lexicographically.
|
lexicographically.
|
||||||
|
|
||||||
**Examples.** `[]`, the empty sequence; `[1 2 3]`, the sequence of
|
|
||||||
`SignedInteger`s 1, 2 and 3.
|
|
||||||
|
|
||||||
### Sets.
|
### Sets.
|
||||||
|
|
||||||
|
@ -208,40 +157,14 @@ induced by the total order on `Value`s. Two `Set`s are compared by
|
||||||
sorting their elements ascending using the [total order](#total-order)
|
sorting their elements ascending using the [total order](#total-order)
|
||||||
and comparing the resulting `Sequence`s.
|
and comparing the resulting `Sequence`s.
|
||||||
|
|
||||||
**Examples.** `#set{}`, the empty set; `#set{#set{}}`, the set
|
|
||||||
containing only the empty set; `{4 "hello" (void) 9.0f}`, the set
|
|
||||||
containing 4, the string `"hello"`, the record with label `void` and
|
|
||||||
no fields, and the `Float` denoting the number 9.0; `{1 1.0f}`, the
|
|
||||||
set containing a `SignedInteger` and a `Float`; `{mime(application/xml
|
|
||||||
#"<x/>") mime(application/xml #"<x />")}`, a set containing two
|
|
||||||
different `mime` records.[^mime-xml-difference]
|
|
||||||
|
|
||||||
[^mime-xml-difference]: The two XML documents `<x/>` and `<x />`
|
|
||||||
differ by bytewise comparison, and thus yield different record
|
|
||||||
values, even though under the semantics of XML they denote
|
|
||||||
identical XML infoset.
|
|
||||||
|
|
||||||
**Non-examples.** `{1 1}`, because it contains multiple equivalent
|
|
||||||
`Value`s; `{}`, because without the `#set` marker, it denotes the
|
|
||||||
empty dictionary.
|
|
||||||
|
|
||||||
### Dictionaries.
|
### Dictionaries.
|
||||||
|
|
||||||
A `Dictionary` is an unordered finite collection of pairs of `Value`s.
|
A `Dictionary` is an unordered finite collection of pairs of `Value`s.
|
||||||
Each pair comprises a *key* and a *value*. Keys in a `Dictionary` must
|
Each pair comprises a *key* and a *value*. Keys in a `Dictionary` are
|
||||||
be pairwise distinct. Instances of `Dictionary` are compared by
|
pairwise distinct. Instances of `Dictionary` are compared by
|
||||||
lexicographic comparison of the sequences resulting from ordering each
|
lexicographic comparison of the sequences resulting from ordering each
|
||||||
`Dictionary`'s pairs in ascending order by key.
|
`Dictionary`'s pairs in ascending order by key.
|
||||||
|
|
||||||
**Examples.** `{}`, the empty dictionary; `{a: 1}`, the dictionary
|
|
||||||
mapping the `Symbol` `a` to the `SignedInteger` 1; `{[1 2 3]: a}`,
|
|
||||||
mapping `[1 2 3]` to `a`; `{"hi": 0, hi: 0, there: []}`, having a
|
|
||||||
`String` and two `Symbol` keys, and `SignedInteger` and `Sequence`
|
|
||||||
values.
|
|
||||||
|
|
||||||
**Non-examples.** `{a:1 b:2 a:3}`, because it contains duplicate
|
|
||||||
keys; `{[7 8]:[] [7 8]:99}`, for the same reason.
|
|
||||||
|
|
||||||
## Textual Syntax
|
## Textual Syntax
|
||||||
|
|
||||||
Now we have discussed `Value`s and their meanings, we may turn to
|
Now we have discussed `Value`s and their meanings, we may turn to
|
||||||
|
@ -282,7 +205,7 @@ or line feed.
|
||||||
|
|
||||||
### Grammar
|
### Grammar
|
||||||
|
|
||||||
Standalone documents containing textual representations of `Value`s may have trailing whitespace.
|
Standalone documents may have trailing whitespace.
|
||||||
|
|
||||||
Document = Value ws
|
Document = Value ws
|
||||||
|
|
||||||
|
@ -301,9 +224,9 @@ the label and the open-parenthesis.
|
||||||
|
|
||||||
`Sequence`s are enclosed in square brackets. `Dictionary` values are
|
`Sequence`s are enclosed in square brackets. `Dictionary` values are
|
||||||
curly-brace-enclosed colon-separated pairs of values. `Set`s are
|
curly-brace-enclosed colon-separated pairs of values. `Set`s are
|
||||||
written either as a simple curly-brace-enclosed non-empty sequence of
|
written either as one or more values enclosed in curly braces, or zero
|
||||||
values, or as a possibly-empty sequence of values enclosed by the
|
or more values enclosed by the tokens `#set{` and
|
||||||
tokens `#set{` and `}`.[^printing-collections]
|
`}`.[^printing-collections]
|
||||||
|
|
||||||
Sequence = "[" *Value ws "]"
|
Sequence = "[" *Value ws "]"
|
||||||
Dictionary = "{" *(Value ws ":" Value) ws "}"
|
Dictionary = "{" *(Value ws ":" Value) ws "}"
|
||||||
|
@ -1325,12 +1248,9 @@ into these types. For example, dates and email addresses are often
|
||||||
represented as strings with an implicit internal structure.
|
represented as strings with an implicit internal structure.
|
||||||
|
|
||||||
There is no convention for *labelling* a value as belonging to a
|
There is no convention for *labelling* a value as belonging to a
|
||||||
particular category. This makes it difficult to extract, say, all
|
particular category. Instead, JSON-encoded data are often labelled in
|
||||||
email addresses, or all URLs, from an arbitrary JSON document.
|
an ad-hoc way. Multiple incompatible approaches exist. For example, a
|
||||||
|
"money" structure containing a `currency` field and an `amount` may be
|
||||||
Instead, JSON-encoded data are often labelled in an ad-hoc way.
|
|
||||||
Multiple incompatible approaches exist. For example, a "money"
|
|
||||||
structure containing a `currency` field and an `amount` may be
|
|
||||||
represented in any number of ways:
|
represented in any number of ways:
|
||||||
|
|
||||||
{ "_type": "money", "currency": "EUR", "amount": 10 }
|
{ "_type": "money", "currency": "EUR", "amount": 10 }
|
||||||
|
|
Loading…
Reference in New Issue