WIP from the early hours of this morning, adding textual syntax
This commit is contained in:
parent
906f8a01b6
commit
6fa0dde8f4
|
@ -6,12 +6,13 @@
|
||||||
# Preserves: an Expressive Data Language
|
# Preserves: an Expressive Data Language
|
||||||
|
|
||||||
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
||||||
September 2018. Version 0.0.2.
|
September 2018. Version 0.0.3.
|
||||||
|
|
||||||
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
|
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
|
||||||
[spki]: http://world.std.com/~cme/html/spki.html
|
[spki]: http://world.std.com/~cme/html/spki.html
|
||||||
[varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
|
[varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
|
||||||
[erlang-map]: http://erlang.org/doc/reference_manual/data_types.html#map
|
[erlang-map]: http://erlang.org/doc/reference_manual/data_types.html#map
|
||||||
|
[abnf]: https://tools.ietf.org/html/rfc7405
|
||||||
|
|
||||||
This document proposes a data model and serialization format called
|
This document proposes a data model and serialization format called
|
||||||
*Preserves*.
|
*Preserves*.
|
||||||
|
@ -47,7 +48,8 @@ structures of any particular implementation language.
|
||||||
|
|
||||||
Taking inspiration from functional programming, we start with a
|
Taking inspiration from functional programming, we start with a
|
||||||
definition of the *values* that we want to work with and give them
|
definition of the *values* that we want to work with and give them
|
||||||
meaning independent of their syntax. We will treat syntax separately,
|
meaning independent of their syntax. When we write examples of values,
|
||||||
|
we will do so using the [textual syntax](#textual-syntax) defined
|
||||||
later in this document.
|
later in this document.
|
||||||
|
|
||||||
Our `Value`s fall into two broad categories: *atomic* and *compound*
|
Our `Value`s fall into two broad categories: *atomic* and *compound*
|
||||||
|
@ -94,8 +96,7 @@ neither is less than the other according to the total order.
|
||||||
### Signed integers.
|
### Signed integers.
|
||||||
|
|
||||||
A `SignedInteger` is a signed integer of arbitrary width.
|
A `SignedInteger` is a signed integer of arbitrary width.
|
||||||
`SignedInteger`s are compared as mathematical integers. We will write
|
`SignedInteger`s are compared as mathematical integers.
|
||||||
examples of `SignedInteger`s using standard mathematical notation.
|
|
||||||
|
|
||||||
**Examples.** 10; -6; 0.
|
**Examples.** 10; -6; 0.
|
||||||
|
|
||||||
|
@ -107,8 +108,7 @@ examples of `SignedInteger`s using standard mathematical notation.
|
||||||
A `String` is a sequence of Unicode
|
A `String` is a sequence of Unicode
|
||||||
[code-point](http://www.unicode.org/glossary/#code_point)s. `String`s
|
[code-point](http://www.unicode.org/glossary/#code_point)s. `String`s
|
||||||
are compared lexicographically, code-point by
|
are compared lexicographically, code-point by
|
||||||
code-point.[^utf8-is-awesome] We will write examples of `String`s as
|
code-point.[^utf8-is-awesome]
|
||||||
text surrounded by quotes “`"`”.
|
|
||||||
|
|
||||||
[^utf8-is-awesome]: Happily, the design of UTF-8 is such that this
|
[^utf8-is-awesome]: Happily, the design of UTF-8 is such that this
|
||||||
gives the same result as a lexicographic byte-by-byte comparison
|
gives the same result as a lexicographic byte-by-byte comparison
|
||||||
|
@ -121,33 +121,27 @@ the string containing the three Unicode code-points `z` (0x7A), `水`
|
||||||
### Binary data.
|
### Binary data.
|
||||||
|
|
||||||
A `ByteString` is an ordered sequence of zero or more eight-bit bytes.
|
A `ByteString` is an ordered sequence of zero or more eight-bit bytes.
|
||||||
`ByteString`s are compared lexicographically. We will only write
|
`ByteString`s are compared lexicographically.
|
||||||
examples of `ByteString`s that contain bytes denoting printable ASCII
|
|
||||||
characters, using “`#"`” as an open-quote and “`"`” as a close-quote
|
|
||||||
mark.
|
|
||||||
|
|
||||||
**Examples.** The `ByteString` containing the integers 65, 66 and 67
|
**Examples.** `#""`, the empty `ByteString`; `#"ABC"`, the
|
||||||
(corresponding to ASCII characters `A`, `B` and `C`) is written as
|
`ByteString` containing the integers 65, 66 and 67 (corresponding to
|
||||||
`#"ABC"`. The empty `ByteString` is written as `#""`. **N.B.** Despite
|
ASCII characters `A`, `B` and `C`). **N.B.** Despite appearances,
|
||||||
appearances, these are *binary* data.
|
these are *binary* data.
|
||||||
|
|
||||||
### Symbols.
|
### Symbols.
|
||||||
|
|
||||||
Programming languages like Lisp and Prolog frequently use string-like
|
Programming languages like Lisp and Prolog frequently use string-like
|
||||||
values called *symbols*. Here, a `Symbol` is, like a `String`, a
|
values called *symbols*. Here, a `Symbol` is, like a `String`, a
|
||||||
sequence of Unicode code-points representing an identifier of some
|
sequence of Unicode code-points representing an identifier of some
|
||||||
kind. `Symbol`s are also compared lexicographically by code-point. We
|
kind. `Symbol`s are also compared lexicographically by code-point.
|
||||||
will write examples including only non-empty sequences of
|
|
||||||
non-whitespace characters, using a monospace font without quotation
|
|
||||||
marks.
|
|
||||||
|
|
||||||
**Examples.** `hello-world`; `utf8-string`; `exact-integer?`.
|
**Examples.** `hello-world`; `utf8-string`; `exact-integer?`.
|
||||||
|
|
||||||
### Booleans.
|
### Booleans.
|
||||||
|
|
||||||
There are exactly two `Boolean` values, “false” and “true”. The
|
There are exactly two `Boolean` values, “false” and “true”. The
|
||||||
“false” value compares less-than the “true” value. We write `#f` for
|
“false” value compares less-than the “true” value. We write `#false`
|
||||||
“false”, and `#t` for “true”.
|
for “false”, and `#true` for “true”.
|
||||||
|
|
||||||
### IEEE floating-point values.
|
### IEEE floating-point values.
|
||||||
|
|
||||||
|
@ -159,11 +153,11 @@ every `Double`, and every `SignedInteger` is greater than both. Two
|
||||||
`Float`s or two `Double`s are to be ordered by the `totalOrder`
|
`Float`s or two `Double`s are to be ordered by the `totalOrder`
|
||||||
predicate defined in section 5.10 of
|
predicate defined in section 5.10 of
|
||||||
[IEEE Std 754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935).
|
[IEEE Std 754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935).
|
||||||
We write examples using standard mathematical notation, avoiding NaN
|
We write examples using a fractional part and/or an exponent to
|
||||||
and infinities, using a suffix `f` or `d` to indicate `Float` or
|
distinguish them from `SignedInteger`s. An additional suffix `f`
|
||||||
`Double`, respectively.
|
distinguishes `Float`s from `Double`s.
|
||||||
|
|
||||||
**Examples.** 10f; -6d; 0f; 0.5d; -1.202e300d.
|
**Examples.** 10.0f; -6.0; 0.0f; 0.5; -1.202e300.
|
||||||
|
|
||||||
**Non-examples.** 10, -6, and 0, because writing them this way
|
**Non-examples.** 10, -6, and 0, because writing them this way
|
||||||
indicates `SignedInteger`s, not `Float`s or `Double`s.
|
indicates `SignedInteger`s, not `Float`s or `Double`s.
|
||||||
|
@ -174,9 +168,7 @@ A `Record` is a *labelled* tuple of zero or more `Value`s, called the
|
||||||
record's *fields*. A record's label is itself a `Value`, though it
|
record's *fields*. A record's label is itself a `Value`, though it
|
||||||
will usually be a `Symbol`.[^extensibility] [^iri-labels] `Record`s
|
will usually be a `Symbol`.[^extensibility] [^iri-labels] `Record`s
|
||||||
are compared lexicographically as if they were just tuples; that is,
|
are compared lexicographically as if they were just tuples; that is,
|
||||||
first by their labels, and then by the remainder of their fields. We
|
first by their labels, and then by the remainder of their fields.
|
||||||
will write examples of `Record`s as a parenthesised, space-separated
|
|
||||||
sequence of their label `Value` followed by their field `Value`s.
|
|
||||||
|
|
||||||
[^extensibility]: The [Racket](https://racket-lang.org/) programming
|
[^extensibility]: The [Racket](https://racket-lang.org/) programming
|
||||||
language defines
|
language defines
|
||||||
|
@ -194,17 +186,16 @@ sequence of their label `Value` followed by their field `Value`s.
|
||||||
it cannot be read as an IRI at all, and so the label simply stands
|
it cannot be read as an IRI at all, and so the label simply stands
|
||||||
for itself—for its own `Value`.
|
for itself—for its own `Value`.
|
||||||
|
|
||||||
**Examples.** The `Record` with label `foo` and fields 1, 2 and 3 is
|
**Examples.** `foo(1 2 3)`, a `Record` with label `foo` and fields 1,
|
||||||
written `(foo 1 2 3)`; the `Record` with label `void` and no fields is
|
2 and 3; `void()`, a `Record` with label `void` and no fields.
|
||||||
written `(void)`.
|
|
||||||
|
|
||||||
**Non-examples.** `()`, because it lacks a label.
|
**Non-examples.** `()`, because it lacks a label; `void`, because it
|
||||||
|
lacks even an empty tuple of fields.
|
||||||
|
|
||||||
### Sequences.
|
### Sequences.
|
||||||
|
|
||||||
A `Sequence` is a general-purpose, variable-length ordered sequence of
|
A `Sequence` is a general-purpose, variable-length ordered sequence of
|
||||||
zero or more `Value`s. `Sequence`s are compared lexicographically. We
|
zero or more `Value`s. `Sequence`s are compared lexicographically.
|
||||||
write examples space-separated, surrounded with square brackets.
|
|
||||||
|
|
||||||
**Examples.** `[]`, the empty sequence; `[1 2 3]`, the sequence of
|
**Examples.** `[]`, the empty sequence; `[1 2 3]`, the sequence of
|
||||||
`SignedInteger`s 1, 2 and 3.
|
`SignedInteger`s 1, 2 and 3.
|
||||||
|
@ -215,25 +206,24 @@ A `Set` is an unordered finite set of `Value`s. It contains no
|
||||||
duplicate values, following the [equivalence relation](#equivalence)
|
duplicate values, following the [equivalence relation](#equivalence)
|
||||||
induced by the total order on `Value`s. Two `Set`s are compared by
|
induced by the total order on `Value`s. Two `Set`s are compared by
|
||||||
sorting their elements ascending using the [total order](#total-order)
|
sorting their elements ascending using the [total order](#total-order)
|
||||||
and comparing the resulting `Sequence`s. We write examples
|
and comparing the resulting `Sequence`s.
|
||||||
space-separated, surrounded with curly braces, prefixed by `#set`.
|
|
||||||
|
|
||||||
**Examples.** `#set{}`, the empty set; `#set{#set{}}`, the set
|
**Examples.** `#set{}`, the empty set; `#set{#set{}}`, the set
|
||||||
containing only the empty set; `#set{4 "hello" (void) 9.0f}`, the set
|
containing only the empty set; `{4 "hello" (void) 9.0f}`, the set
|
||||||
containing 4, the string `"hello"`, the record with label `void` and
|
containing 4, the string `"hello"`, the record with label `void` and
|
||||||
no fields, and the `Float` denoting the number 9.0; `#set{1 1.0f}`,
|
no fields, and the `Float` denoting the number 9.0; `{1 1.0f}`, the
|
||||||
the set containing a `SignedInteger` and a `Float`; `#set{(mime
|
set containing a `SignedInteger` and a `Float`; `{mime(application/xml
|
||||||
application/xml #"<x/>") (mime application/xml #"<x />")}`, a set
|
#"<x/>") mime(application/xml #"<x />")}`, a set containing two
|
||||||
containing two different type-labelled byte
|
different `mime` records.[^mime-xml-difference]
|
||||||
arrays.[^mime-xml-difference]
|
|
||||||
|
|
||||||
[^mime-xml-difference]: The two XML documents `<x/>` and `<x />`
|
[^mime-xml-difference]: The two XML documents `<x/>` and `<x />`
|
||||||
differ by bytewise comparison, and thus yield different record
|
differ by bytewise comparison, and thus yield different record
|
||||||
values, even though under the semantics of XML they denote
|
values, even though under the semantics of XML they denote
|
||||||
identical XML infoset.
|
identical XML infoset.
|
||||||
|
|
||||||
**Non-examples.** `#set{1 1 1}`, because it contains multiple
|
**Non-examples.** `{1 1}`, because it contains multiple equivalent
|
||||||
equivalent `Value`s.
|
`Value`s; `{}`, because without the `#set` marker, it denotes the
|
||||||
|
empty dictionary.
|
||||||
|
|
||||||
### Dictionaries.
|
### Dictionaries.
|
||||||
|
|
||||||
|
@ -241,27 +231,189 @@ A `Dictionary` is an unordered finite collection of pairs of `Value`s.
|
||||||
Each pair comprises a *key* and a *value*. Keys in a `Dictionary` must
|
Each pair comprises a *key* and a *value*. Keys in a `Dictionary` must
|
||||||
be pairwise distinct. Instances of `Dictionary` are compared by
|
be pairwise distinct. Instances of `Dictionary` are compared by
|
||||||
lexicographic comparison of the sequences resulting from ordering each
|
lexicographic comparison of the sequences resulting from ordering each
|
||||||
`Dictionary`'s pairs in ascending order by key. Examples are written
|
`Dictionary`'s pairs in ascending order by key.
|
||||||
as a `#dict`-prefixed, curly-brace-surrounded sequence of
|
|
||||||
space-separated key-value pairs, each written with a colon between the
|
|
||||||
key and value.
|
|
||||||
|
|
||||||
**Examples.** `#dict{}`, the empty dictionary; `#dict{a:1}`, the
|
**Examples.** `{}`, the empty dictionary; `{a: 1}`, the dictionary
|
||||||
dictionary mapping the `Symbol` `a` to the `SignedInteger` 1;
|
mapping the `Symbol` `a` to the `SignedInteger` 1; `{[1 2 3]: a}`,
|
||||||
`#dict{[1 2 3]:a}`, mapping `[1 2 3]` to `a`; `#dict{"hi":0 hi:0
|
mapping `[1 2 3]` to `a`; `{"hi": 0, hi: 0, there: []}`, having a
|
||||||
there:[]}`, having a `String` and two `Symbol` keys, and
|
`String` and two `Symbol` keys, and `SignedInteger` and `Sequence`
|
||||||
`SignedInteger` and `Sequence` values.
|
values.
|
||||||
|
|
||||||
**Non-examples.** `#dict{a:1 b:2 a:3}`, because it contains duplicate
|
**Non-examples.** `{a:1 b:2 a:3}`, because it contains duplicate
|
||||||
keys; `#dict{[7 8]:[] [7 8]:99}`, for the same reason.
|
keys; `{[7 8]:[] [7 8]:99}`, for the same reason.
|
||||||
|
|
||||||
## Syntax
|
## Textual Syntax
|
||||||
|
|
||||||
Now we have discussed `Value`s and their meanings, we may turn to
|
Now we have discussed `Value`s and their meanings, we may turn to
|
||||||
techniques for *representing* `Value`s for communication or storage.
|
techniques for *representing* `Value`s for communication or storage.
|
||||||
|
|
||||||
For now, we limit our attention to an easily-parsed, easily-produced
|
In this section, we use [case-sensitive ABNF][abnf] to define a
|
||||||
machine-readable syntax.
|
textual syntax that is easy for people to read and
|
||||||
|
write.[^json-superset] Most of the examples in this document are
|
||||||
|
written using this syntax. In the following section, we will define an
|
||||||
|
equivalent compact machine-readable syntax.
|
||||||
|
|
||||||
|
[^json-superset]: The grammar of the textual syntax is a superset of
|
||||||
|
JSON, with the slightly unusual feature that `true`, `false`, and
|
||||||
|
`null` are all read as `Symbol`s, and that `SignedInteger`s are
|
||||||
|
never read as `Double`s.
|
||||||
|
|
||||||
|
### Character set
|
||||||
|
|
||||||
|
[ABNF][abnf] allows easy definition of US-ASCII-based languages.
|
||||||
|
However, Preserves is a Unicode-based language. Therefore, we
|
||||||
|
reinterpret ABNF as a grammar for recognising sequences of Unicode
|
||||||
|
code points.
|
||||||
|
|
||||||
|
Textual syntax for a `Value` *SHOULD* be encoded using UTF-8 where
|
||||||
|
possible.
|
||||||
|
|
||||||
|
### Whitespace
|
||||||
|
|
||||||
|
Whitespace is defined as any number of spaces, tabs, carriage returns,
|
||||||
|
line feeds, comments, or commas. A comment is a semicolon followed by
|
||||||
|
the unicode code points up to and including the next carriage return
|
||||||
|
or line feed.
|
||||||
|
|
||||||
|
ws = *(%x20 / %x09 / newline / comment / ",")
|
||||||
|
newline = CR / LF
|
||||||
|
comment = ";" *(WSP / nonnl) newline
|
||||||
|
nonnl = <any Unicode code point except CR or LF>
|
||||||
|
|
||||||
|
### Grammar
|
||||||
|
|
||||||
|
Standalone documents containing textual representations of `Value`s may have trailing whitespace.
|
||||||
|
|
||||||
|
Document = Value ws
|
||||||
|
|
||||||
|
Any `Value` may be preceded by whitespace.
|
||||||
|
|
||||||
|
Value = ws (Record / Collection / Atom / Compact)
|
||||||
|
Collection = Sequence / Dictionary / Set
|
||||||
|
Atom = Boolean / Float / Double / SignedInteger /
|
||||||
|
String / ByteString / Symbol
|
||||||
|
|
||||||
|
Each `Record` is its label-`Value` followed by a parenthesised
|
||||||
|
grouping of its field-`Value`s.
|
||||||
|
|
||||||
|
Record = Value ws "(" *Value ws ")"
|
||||||
|
|
||||||
|
`Sequence`s are enclosed in square brackets. `Dictionary` values are
|
||||||
|
curly-brace-enclosed colon-separated pairs of values. `Set`s are
|
||||||
|
written either as a simple curly-brace-enclosed non-empty sequence of
|
||||||
|
values, or as a possibly-empty sequence of values enclosed by the
|
||||||
|
tokens `#set{` and `}`.
|
||||||
|
|
||||||
|
Sequence = "[" *Value ws "]"
|
||||||
|
Dictionary = "{" *(Value ws ":" Value) ws "}"
|
||||||
|
Set = %s"#set{" *Value ws "}" / "{" 1*Value ws "}"
|
||||||
|
|
||||||
|
Any `Value` may be represented using the
|
||||||
|
[compact binary syntax](#compact-binary-syntax) by directly prefixing
|
||||||
|
the binary form of the `Value` with ASCII `SOH` (`%x01`), or by
|
||||||
|
enclosing a hexadecimal representation of the binary form of the
|
||||||
|
`Value` in the tokens `#hexvalue{` and `}`.
|
||||||
|
|
||||||
|
Compact = %x01 <binary data> / %s"#hexvalue{" *(ws / HEXDIG) ws "}"
|
||||||
|
|
||||||
|
`Boolean`s are the simple literal strings `#true` and `#false`.
|
||||||
|
|
||||||
|
Boolean = %s"#true" / %s"#false"
|
||||||
|
|
||||||
|
Numeric data follow the
|
||||||
|
[JSON grammar](https://tools.ietf.org/html/rfc8259#section-6), with
|
||||||
|
the addition of a trailing "f" distinguishing `Float` from `Double`
|
||||||
|
values. `Float`s and `Double`s always have either a fractional part or
|
||||||
|
an exponent part, where `SignedInteger`s never have either.
|
||||||
|
|
||||||
|
TODO: talk about precise reading of floats, and the need for arbitrary
|
||||||
|
precision. Your language will often have a good floating-point reading
|
||||||
|
library.
|
||||||
|
|
||||||
|
Float = flt %i"f"
|
||||||
|
Double = flt
|
||||||
|
SignedInteger = int
|
||||||
|
|
||||||
|
digit1-9 = %x31-39
|
||||||
|
nat = %x30 / ( digit1-9 *DIGIT )
|
||||||
|
int = ["-"] nat
|
||||||
|
frac = "." 1*DIGIT
|
||||||
|
exp = %i"e" ["-"/"+"] 1*DIGIT
|
||||||
|
flt = int (frac exp / frac / exp)
|
||||||
|
|
||||||
|
`String`s are,
|
||||||
|
[as in JSON](https://tools.ietf.org/html/rfc8259#section-7), possibly
|
||||||
|
escaped text surrounded by double quotes. The escaping rules are the
|
||||||
|
same as for JSON.[^string-json-correspondence]
|
||||||
|
|
||||||
|
TODO: discuss surrogate pairs in \uXXXX form
|
||||||
|
|
||||||
|
String = %x22 *char %x22
|
||||||
|
char = unescaped / %x7C / escape (escaped / %x22 / %s"u" 4HEXDIG)
|
||||||
|
unescaped = %x20-21 / %x23-5B / %x5D-7B / %x7D-10FFFF
|
||||||
|
escape = %x5C ; \
|
||||||
|
escaped = ( %x5C / ; \ reverse solidus U+005C
|
||||||
|
%x2F / ; / solidus U+002F
|
||||||
|
%x62 / ; b backspace U+0008
|
||||||
|
%x66 / ; f form feed U+000C
|
||||||
|
%x6E / ; n line feed U+000A
|
||||||
|
%x72 / ; r carriage return U+000D
|
||||||
|
%x74 ) ; t tab U+0009
|
||||||
|
|
||||||
|
[^string-json-correspondence]: The grammar for `String` has the same
|
||||||
|
effect as the
|
||||||
|
[JSON](https://tools.ietf.org/html/rfc8259#section-7) grammar for
|
||||||
|
`string`. Some auxiliary definitions (e.g. `escaped`) are lifted
|
||||||
|
largely unmodified from the text of RFC 8259.
|
||||||
|
|
||||||
|
A `ByteString` may be written in any of three different forms.
|
||||||
|
|
||||||
|
The first is similar to a `String`, but prepended with a hash sign
|
||||||
|
`#`. In addition, only Unicode code points overlapping with printable
|
||||||
|
7-bit ASCII are permitted unescaped inside such a `ByteString`; other
|
||||||
|
byte values must be escaped by prepending a two-digit hexadecimal
|
||||||
|
value with `\x`.
|
||||||
|
|
||||||
|
ByteString = "#" %x22 *binchar %x22
|
||||||
|
binchar = binunescaped / escape (escaped / %x22 / %s"x" 2HEXDIG)
|
||||||
|
binunescaped = %x20-21 / %x23-5B / %x5D-7E
|
||||||
|
|
||||||
|
The second is as a sequence of pairs hexadecimal digits interleaved
|
||||||
|
with whitespace and surrounded by `#hex{` and `}`.
|
||||||
|
|
||||||
|
ByteString =/ %s"#hex{" *(ws / 2HEXDIG) ws "}"
|
||||||
|
|
||||||
|
The third is as a sequence of
|
||||||
|
[Base64](https://tools.ietf.org/html/rfc4648) characters, interleaved
|
||||||
|
with whitespace and surrounded by `#base64{` and `}`. Plain and
|
||||||
|
URL-safe Base64 characters are allowed.
|
||||||
|
|
||||||
|
ByteString =/ %s"#base64{" *(ws / base64char) ws "}" /
|
||||||
|
base64char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "-" / "_" / "="
|
||||||
|
|
||||||
|
A `Symbol` may be written in a "bare" form,[^cf-sexp-token] so long as
|
||||||
|
it conforms to certain restrictions on the characters appearing in the
|
||||||
|
symbol, or in a quoted form. The quoted form is much the same as the
|
||||||
|
syntax for `String`s, including embedded escape syntax, except using a
|
||||||
|
bar or pipe character (`|`) instead of a double quote mark.
|
||||||
|
|
||||||
|
Symbol = symstart *symcont / "|" *symchar "|"
|
||||||
|
symstart = ALPHA / sympunct
|
||||||
|
symcont = ALPHA / sympunct / DIGIT / "-" / "."
|
||||||
|
sympunct = "~" / "!" / "@" / "$" / "%" / "^" / "&" / "*" /
|
||||||
|
"?" / "_" / "=" / "+" / "<" / ">" / "/"
|
||||||
|
symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
|
||||||
|
|
||||||
|
[^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt]
|
||||||
|
definition of "token representation".
|
||||||
|
|
||||||
|
TODO: More unicode in unescaped symbols?
|
||||||
|
|
||||||
|
### Printing
|
||||||
|
|
||||||
|
Recommend a JSON-compatible print mode. Recommend a submode with trailing commas.
|
||||||
|
|
||||||
|
## Compact Binary Syntax
|
||||||
|
|
||||||
A `Repr` is an encoding, or representation, of a specific `Value`.
|
A `Repr` is an encoding, or representation, of a specific `Value`.
|
||||||
Each `Repr` comprises one or more bytes describing first the kind of
|
Each `Repr` comprises one or more bytes describing first the kind of
|
||||||
|
@ -373,14 +525,14 @@ be a single `Repr`.
|
||||||
|
|
||||||
Format B (known length):
|
Format B (known length):
|
||||||
|
|
||||||
[[ (L F_1...F_m) ]] = header(2,3,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]]
|
[[ L(F_1...F_m) ]] = header(2,3,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]]
|
||||||
|
|
||||||
For `m` fields, `m+1` is supplied to `header`, to account for the
|
For `m` fields, `m+1` is supplied to `header`, to account for the
|
||||||
encoding of the record label.
|
encoding of the record label.
|
||||||
|
|
||||||
Format C (streaming):
|
Format C (streaming):
|
||||||
|
|
||||||
[[ (L F_1...F_m) ]] = open(2,3) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close(2,3)
|
[[ L(F_1...F_m) ]] = open(2,3) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close(2,3)
|
||||||
|
|
||||||
Applications *SHOULD* prefer the known-length format for encoding
|
Applications *SHOULD* prefer the known-length format for encoding
|
||||||
`Record`s.
|
`Record`s.
|
||||||
|
@ -401,12 +553,12 @@ and format C becomes
|
||||||
**Examples.** For example, a protocol may choose to map records
|
**Examples.** For example, a protocol may choose to map records
|
||||||
labelled `void` to `n=0`, making
|
labelled `void` to `n=0`, making
|
||||||
|
|
||||||
[[(void)]] = header(2,0,0) = [0x80]
|
[[void()]] = header(2,0,0) = [0x80]
|
||||||
|
|
||||||
or it may map records labelled `person` to short form label number 1,
|
or it may map records labelled `person` to short form label number 1,
|
||||||
making
|
making
|
||||||
|
|
||||||
[[(person "Dr" "Elizabeth" "Blackwell")]]
|
[[person("Dr", "Elizabeth", "Blackwell")]]
|
||||||
= header(2,1,3) ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
|
= header(2,1,3) ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
|
||||||
= [0x93] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
|
= [0x93] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
|
||||||
|
|
||||||
|
@ -423,7 +575,7 @@ Format B (known length):
|
||||||
|
|
||||||
[[ [X_1...X_m] ]] = header(3,0,m) ++ [[X_1]] ++...++ [[X_m]]
|
[[ [X_1...X_m] ]] = header(3,0,m) ++ [[X_1]] ++...++ [[X_m]]
|
||||||
[[ #set{X_1...X_m} ]] = header(3,1,m) ++ [[X_1]] ++...++ [[X_m]]
|
[[ #set{X_1...X_m} ]] = header(3,1,m) ++ [[X_1]] ++...++ [[X_m]]
|
||||||
[[ #dict{K_1:V_1...K_m:V_m} ]] = header(3,2,m*2) ++ [[K_1]] ++ [[V_1]] ++...
|
[[ {K_1:V_1...K_m:V_m} ]] = header(3,2,m*2) ++ [[K_1]] ++ [[V_1]] ++...
|
||||||
++ [[K_m]] ++ [[V_m]]
|
++ [[K_m]] ++ [[V_m]]
|
||||||
|
|
||||||
Note that `m*2` is given to `header` for a `Dictionary`, since there
|
Note that `m*2` is given to `header` for a `Dictionary`, since there
|
||||||
|
@ -433,7 +585,7 @@ Format C (streaming):
|
||||||
|
|
||||||
[[ [X_1...X_m] ]] = open(3,0) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,0)
|
[[ [X_1...X_m] ]] = open(3,0) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,0)
|
||||||
[[ #set{X_1...X_m} ]] = open(3,1) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,1)
|
[[ #set{X_1...X_m} ]] = open(3,1) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,1)
|
||||||
[[ #dict{K_1:V_1...K_m:V_m} ]] = open(3,2) ++ [[K_1]] ++ [[V_1]] ++...
|
[[ {K_1:V_1...K_m:V_m} ]] = open(3,2) ++ [[K_1]] ++ [[V_1]] ++...
|
||||||
++ [[K_m]] ++ [[V_m]] ++ close(3,2)
|
++ [[K_m]] ++ [[V_m]] ++ close(3,2)
|
||||||
|
|
||||||
Applications may use whichever format suits their needs on a
|
Applications may use whichever format suits their needs on a
|
||||||
|
@ -528,8 +680,8 @@ specify lengths. Applications *MUST NOT* use format C with
|
||||||
|
|
||||||
#### Booleans
|
#### Booleans
|
||||||
|
|
||||||
[[ #f ]] = header(0,0,0) = [0x00]
|
[[ #false ]] = header(0,0,0) = [0x00]
|
||||||
[[ #t ]] = header(0,0,1) = [0x01]
|
[[ #true ]] = header(0,0,1) = [0x01]
|
||||||
|
|
||||||
#### Floats and Doubles
|
#### Floats and Doubles
|
||||||
|
|
||||||
|
@ -550,31 +702,27 @@ short form label number 0 to label `discard`, 1 to `capture`, and 2 to
|
||||||
|
|
||||||
| Value | Encoded hexadecimal byte sequence |
|
| Value | Encoded hexadecimal byte sequence |
|
||||||
|---------------------------------------------------|----------------------------------------------------------------------|
|
|---------------------------------------------------|----------------------------------------------------------------------|
|
||||||
| `(capture (discard))` | 91 80 |
|
| `capture(discard())` | 91 80 |
|
||||||
| `(observe (speak (discard) (capture (discard))))` | A1 B3 75 73 70 65 61 6B 80 91 80 |
|
| `observe(speak(discard(), capture(discard())))` | A1 B3 75 73 70 65 61 6B 80 91 80 |
|
||||||
| `[1 2 3 4]` (format B) | C4 11 12 13 14 |
|
| `[1 2 3 4]` (format B) | C4 11 12 13 14 |
|
||||||
| `[1 2 3 4]` (format C) | 2C 11 12 13 14 3C |
|
| `[1 2 3 4]` (format C) | 2C 11 12 13 14 3C |
|
||||||
| `[-2 -1 0 1]` | C4 1E 1F 10 11 |
|
| `[-2 -1 0 1]` | C4 1E 1F 10 11 |
|
||||||
| `"hello"` (format B) | 55 68 65 6C 6C 6F |
|
| `"hello"` (format B) | 55 68 65 6C 6C 6F |
|
||||||
| `"hello"` (format C, 2 chunks) | 25 62 68 65 63 6C 6C 6F 35 |
|
| `"hello"` (format C, 2 chunks) | 25 62 68 65 63 6C 6C 6F 35 |
|
||||||
| `"hello"` (format C, 5 chunks) | 25 62 68 65 62 6C 6C 60 60 61 6F 35 |
|
| `"hello"` (format C, 5 chunks) | 25 62 68 65 62 6C 6C 60 60 61 6F 35 |
|
||||||
| `["hello" there #"world" [] #set{} #t #f]` | C7 55 68 65 6C 6C 6F 75 74 68 65 72 65 65 77 6F 72 6C 64 C0 D0 01 00 |
|
| `["hello" there #"world" [] #set{} #true #false]` | C7 55 68 65 6C 6C 6F 75 74 68 65 72 65 65 77 6F 72 6C 64 C0 D0 01 00 |
|
||||||
| `-257` | 42 FE FF |
|
| `-257` | 42 FE FF |
|
||||||
| `-1` | 1F |
|
| `-1` | 1F |
|
||||||
| `0` | 10 |
|
| `0` | 10 |
|
||||||
| `1` | 11 |
|
| `1` | 11 |
|
||||||
| `255` | 42 00 FF |
|
| `255` | 42 00 FF |
|
||||||
| `1f` | 02 3F 80 00 00 |
|
| `1.0f` | 02 3F 80 00 00 |
|
||||||
| `1d` | 03 3F F0 00 00 00 00 00 00 |
|
| `1.0` | 03 3F F0 00 00 00 00 00 00 |
|
||||||
| `-1.202e300d` | 03 FE 3C B7 B7 59 BF 04 26 |
|
| `-1.202e300` | 03 FE 3C B7 B7 59 BF 04 26 |
|
||||||
|
|
||||||
Finally, a larger example, using a non-`Symbol` label for a record.[^extensibility2] The `Record`
|
Finally, a larger example, using a non-`Symbol` label for a record.[^extensibility2] The `Record`
|
||||||
|
|
||||||
([titled person 2 thing 1]
|
[titled person 2 thing 1](101, "Blackwell", date(1821 2 3), "Dr")
|
||||||
101
|
|
||||||
"Blackwell"
|
|
||||||
(date 1821 2 3)
|
|
||||||
"Dr")
|
|
||||||
|
|
||||||
encodes to
|
encodes to
|
||||||
|
|
||||||
|
@ -671,16 +819,16 @@ such media types following the general rules for ordering of
|
||||||
|
|
||||||
| Value | Encoded hexadecimal byte sequence |
|
| Value | Encoded hexadecimal byte sequence |
|
||||||
|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
|
|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
|
||||||
| `(mime application/octet-stream #"abcde")` | B3 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
|
| `mime(application/octet-stream #"abcde")` | B3 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
|
||||||
| `(mime text/plain #"ABC")` | B3 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 |
|
| `mime(text/plain #"ABC")` | B3 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 |
|
||||||
| `(mime application/xml #"<xhtml/>")` | B3 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E |
|
| `mime(application/xml #"<xhtml/>")` | B3 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E |
|
||||||
| `(mime text/csv #"123,234,345")` | B3 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 |
|
| `mime(text/csv #"123,234,345")` | B3 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 |
|
||||||
|
|
||||||
Applications making heavy use of `mime` records may choose to use a
|
Applications making heavy use of `mime` records may choose to use a
|
||||||
short form label number for the record type. For example, if short
|
short form label number for the record type. For example, if short
|
||||||
form label number 1 were chosen, the second example above, `(mime
|
form label number 1 were chosen, the second example above,
|
||||||
text/plain "ABC")`, would be encoded with "92" in place of "B3 74 6D
|
`mime(text/plain "ABC")`, would be encoded with "92" in place of "B3
|
||||||
69 6D 65".
|
74 6D 69 6D 65".
|
||||||
|
|
||||||
### Unicode normalization forms
|
### Unicode normalization forms
|
||||||
|
|
||||||
|
@ -707,13 +855,13 @@ The definition of `SignedInteger` captures all integers. However, in
|
||||||
certain circumstances it can be valuable to assert that a number
|
certain circumstances it can be valuable to assert that a number
|
||||||
inhabits a particular range, such as a fixed-width machine word.
|
inhabits a particular range, such as a fixed-width machine word.
|
||||||
|
|
||||||
A family of labels `i`*n* and `u`*n* for *n* ∈ {16,32,64} denote
|
A family of labels `i`*n* and `u`*n* for *n* ∈ {8,16,32,64} denote
|
||||||
*n*-bit-wide signed and unsigned range restrictions, respectively.
|
*n*-bit-wide signed and unsigned range restrictions, respectively.
|
||||||
Records with these labels *MUST* have one field, a `SignedInteger`,
|
Records with these labels *MUST* have one field, a `SignedInteger`,
|
||||||
which *MUST* fall within the appropriate range. That is, to be valid,
|
which *MUST* fall within the appropriate range. That is, to be valid,
|
||||||
- in `(i16 `*x*`)`, -32768 <= *x* <= 32767.
|
- in `i8(`*x*`)`, -128 <= *x* <= 127.
|
||||||
- in `(u16 `*x*`)`, 0 <= *x* <= 65535.
|
- in `u8(`*x*`)`, 0 <= *x* <= 255.
|
||||||
- in `(i32 `*x*`)`, -2147483648 <= *x* <= 2147483647.
|
- in `i16(`*x*`)`, -32768 <= *x* <= 32767.
|
||||||
- etc.
|
- etc.
|
||||||
|
|
||||||
### Anonymous Tuples and Unit
|
### Anonymous Tuples and Unit
|
||||||
|
@ -721,15 +869,15 @@ which *MUST* fall within the appropriate range. That is, to be valid,
|
||||||
A `Tuple` is a `Record` with label `tuple` and zero or more fields,
|
A `Tuple` is a `Record` with label `tuple` and zero or more fields,
|
||||||
denoting an anonymous tuple of values.
|
denoting an anonymous tuple of values.
|
||||||
|
|
||||||
The 0-ary tuple, `(tuple)`, denotes the empty tuple, sometimes called
|
The 0-ary tuple, `tuple()`, denotes the empty tuple, sometimes called
|
||||||
"unit" or "void" (but *not* e.g. JavaScript's "undefined" value).
|
"unit" or "void" (but *not* e.g. JavaScript's "undefined" value).
|
||||||
|
|
||||||
### Null and Undefined
|
### Null and Undefined
|
||||||
|
|
||||||
Tony Hoare's
|
Tony Hoare's
|
||||||
"[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)"
|
"[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)"
|
||||||
can be represented with the 0-ary `Record` `(null)`. An "undefined"
|
can be represented with the 0-ary `Record` `null()`. An "undefined"
|
||||||
value can be represented as `(undefined)`.
|
value can be represented as `undefined()`.
|
||||||
|
|
||||||
### Dates and Times
|
### Dates and Times
|
||||||
|
|
||||||
|
@ -741,6 +889,8 @@ or `date-time` productions of
|
||||||
|
|
||||||
## Security Considerations
|
## Security Considerations
|
||||||
|
|
||||||
|
TODO: Lots of whitespace is just like lots of empty chunks
|
||||||
|
|
||||||
**Empty chunks.** Streamed (format C) `String`s, `ByteString`s and
|
**Empty chunks.** Streamed (format C) `String`s, `ByteString`s and
|
||||||
`Symbol`s may include chunks of zero length. This opens up a
|
`Symbol`s may include chunks of zero length. This opens up a
|
||||||
possibility for denial-of-service: an attacker may begin streaming a
|
possibility for denial-of-service: an attacker may begin streaming a
|
||||||
|
@ -751,9 +901,9 @@ chunks that may appear in a stream, and may even supply an optional
|
||||||
mode that rejects empty chunks entirely.
|
mode that rejects empty chunks entirely.
|
||||||
|
|
||||||
**Canonical form for cryptographic hashing and signing.** As
|
**Canonical form for cryptographic hashing and signing.** As
|
||||||
specified, the encoding rules for `Value`s do not force canonical
|
specified, neither the textual nor the compact binary encoding rules
|
||||||
serializations for `Set` or `Dictionary` values. Two serializations of
|
for `Value`s force canonical serializations. Two serializations of the
|
||||||
the same `Value` may yield different binary `Repr`s.
|
same `Value` may yield different binary `Repr`s.
|
||||||
|
|
||||||
## Appendix. Table of lead byte values
|
## Appendix. Table of lead byte values
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue