Trim and improve
This commit is contained in:
parent
b2eb53e664
commit
b4d4092b90
|
@ -1,14 +1,16 @@
|
||||||
---
|
---
|
||||||
---
|
---
|
||||||
|
<title>Preserves: an Expressive Data Language</title>
|
||||||
<style>
|
<style>
|
||||||
body { font-family: palatino, "Palatino Linotype", "Palatino LT STD", "URW Palladio L", "TeX Gyre Pagella", serif; }
|
body { font-family: palatino, "Palatino Linotype", "Palatino LT STD", "URW Palladio L", "TeX Gyre Pagella", serif; }
|
||||||
@media screen {
|
@media screen {
|
||||||
body { padding-top: 2rem; max-width: 40em; margin: auto; font-size: 120%; }
|
body { padding-top: 2rem; max-width: 40em; margin: auto; font-size: 120%; }
|
||||||
}
|
}
|
||||||
@media print {
|
@media print {
|
||||||
body { margin-left: 2rem; margin-right: 2rem; }
|
@page { margin: 1.5cm; }
|
||||||
h2 { page-break-before: always }
|
body { margin-left: 2rem; margin-right 2rem; }
|
||||||
h2:first-of-type { page-break-before: auto; }
|
h1, h2 { page-break-before: always }
|
||||||
|
h1:first-of-type, h2:first-of-type { page-break-before: auto; }
|
||||||
}
|
}
|
||||||
h1, h2, h3, h4, h5, h6 { margin-left: -1rem; color: #4f81bd; }
|
h1, h2, h3, h4, h5, h6 { margin-left: -1rem; color: #4f81bd; }
|
||||||
h2 { border-bottom: solid #4f81bd 1px; }
|
h2 { border-bottom: solid #4f81bd 1px; }
|
||||||
|
@ -17,29 +19,45 @@ code { font-size: 75%; }
|
||||||
pre { padding: 0.33rem; }
|
pre { padding: 0.33rem; }
|
||||||
</style>
|
</style>
|
||||||
|
|
||||||
# Preserves: Semantic Serialization of Node-labelled Data
|
# Preserves: an Expressive Data Language
|
||||||
|
|
||||||
_________
|
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
||||||
<_________> Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
September 2018. Version 0.0.2.
|
||||||
| FRμIT | September 2018
|
|
||||||
|Preserves| Version 0.0.2
|
|
||||||
\_________/
|
|
||||||
|
|
||||||
|
|
||||||
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
|
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
|
||||||
[spki]: http://world.std.com/~cme/html/spki.html
|
[spki]: http://world.std.com/~cme/html/spki.html
|
||||||
[varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
|
[varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
|
||||||
[erlang-map]: http://erlang.org/doc/reference_manual/data_types.html#map
|
[erlang-map]: http://erlang.org/doc/reference_manual/data_types.html#map
|
||||||
|
|
||||||
Most data serialization formats used on the web represent
|
This document proposes a data model and serialization format called
|
||||||
*edge-labelled* semi-structured data.
|
*Preserves*.
|
||||||
|
|
||||||
This document proposes a data model and serialization format that
|
Preserves supports *records* with user-defined *labels*. This makes it
|
||||||
takes a *node-labelled* approach.
|
more expressive[^macro-expressiveness] than most data languages in use
|
||||||
|
on the web and allows it to easily represent the *labelled sums of
|
||||||
|
products* as seen in many functional programming languages.
|
||||||
|
|
||||||
This makes it both extensible and much more like S-expressions, making
|
Preserves also supports the usual suite of atomic and compound data
|
||||||
it easily able to represent the *labelled sums of products* as seen in
|
types, in particular including *binary* data as a distinct type from
|
||||||
Rust, Haskell, OCaml, and other functional programming languages.
|
text strings.
|
||||||
|
|
||||||
|
Finally, Preserves defines precisely how to compare two values with
|
||||||
|
each other in terms of the data model, not in terms of syntax or of
|
||||||
|
the data structures of any particular implementation language.
|
||||||
|
|
||||||
|
[^macro-expressiveness]: By "expressive" I mean *macro-expressive*
|
||||||
|
in the sense of Felleisen's 1991 paper, "On the Expressive Power
|
||||||
|
of Programming Languages".
|
||||||
|
|
||||||
|
Roughly speaking, there's no way in a JSON document to introduce a
|
||||||
|
new kind of information (such as binary data, or a date-stamp, or
|
||||||
|
a "person" object) in an *unambiguous way* without *global
|
||||||
|
agreement* from every potential consumer of the document. With an
|
||||||
|
extensible labelled record type, there is.
|
||||||
|
|
||||||
|
Felleisen, Matthias. “On the Expressive Power of Programming
|
||||||
|
Languages.” Science of Computer Programming 17, no. 1--3 (1991):
|
||||||
|
35–75.
|
||||||
|
|
||||||
## Starting with Semantics
|
## Starting with Semantics
|
||||||
|
|
||||||
|
@ -65,20 +83,12 @@ later in this document.
|
||||||
| Dictionary
|
| Dictionary
|
||||||
|
|
||||||
Our `Value`s fall into two broad categories: *atomic* and *compound*
|
Our `Value`s fall into two broad categories: *atomic* and *compound*
|
||||||
data.[^zephyr-asdl]
|
data.[^inspiration]
|
||||||
|
|
||||||
[^zephyr-asdl]: This design was loosely inspired by S-expressions,
|
[^inspiration]: This design was loosely inspired by S-expressions,
|
||||||
as seen in Lisp, Scheme, [SPKI/SDSI][sexp.txt], and many others,
|
as seen in Lisp, Scheme, [SPKI/SDSI][sexp.txt], and many others,
|
||||||
and by the ML type system, as seen in languages such as SML,
|
as well as by the ML type system, as seen in languages such as
|
||||||
OCaml, Haskell, Rust, and many others. It is also related to
|
SML, OCaml, Haskell, Rust, and many others.
|
||||||
Zephyr ASDL (h/t
|
|
||||||
[Darius Bacon](https://twitter.com/abecedarius/status/993545767884226561)),
|
|
||||||
which doesn't offer much in the way of atoms, but offers
|
|
||||||
general-purpose labelled sums and products. See D. C. Wang, A. W.
|
|
||||||
Appel, J. L. Korn, and C. S. Serra, “The Zephyr Abstract Syntax
|
|
||||||
Description Language,” in USENIX Conference on Domain-Specific
|
|
||||||
Languages, 1997, pp. 213–228.
|
|
||||||
[PDF available.](https://www.usenix.org/legacy/publications/library/proceedings/dsl97/full_papers/wang/wang.pdf)
|
|
||||||
|
|
||||||
**Total order.**<a name="total-order"></a> As we go, we will
|
**Total order.**<a name="total-order"></a> As we go, we will
|
||||||
incrementally specify a total order over `Value`s. Two values of the
|
incrementally specify a total order over `Value`s. Two values of the
|
||||||
|
@ -101,9 +111,6 @@ follows:[^ordering-by-syntax]
|
||||||
**Equivalence.**<a name="equivalence"></a> Two `Value`s are equal if
|
**Equivalence.**<a name="equivalence"></a> Two `Value`s are equal if
|
||||||
neither is less than the other according to the total order.
|
neither is less than the other according to the total order.
|
||||||
|
|
||||||
<!-- We should avoid unnecessary restrictions such as machine-oriented -->
|
|
||||||
<!-- fixed-width integer or floating-point values where possible. -->
|
|
||||||
|
|
||||||
### Signed integers.
|
### Signed integers.
|
||||||
|
|
||||||
A `SignedInteger` is a signed integer of arbitrary width.
|
A `SignedInteger` is a signed integer of arbitrary width.
|
||||||
|
@ -120,8 +127,8 @@ examples of `SignedInteger`s using standard mathematical notation.
|
||||||
A `String` is a sequence of Unicode
|
A `String` is a sequence of Unicode
|
||||||
[code-point](http://www.unicode.org/glossary/#code_point)s. Two
|
[code-point](http://www.unicode.org/glossary/#code_point)s. Two
|
||||||
`String`s are compared lexicographically, code-point by
|
`String`s are compared lexicographically, code-point by
|
||||||
code-point.[^utf8-is-awesome] We will write examples of `String`s text
|
code-point.[^utf8-is-awesome] We will write examples of `String`s as
|
||||||
surrounded by double-quotes “`"`” using a monospace font.
|
text surrounded by double-quotes “`"`” using a monospace font.
|
||||||
|
|
||||||
[^utf8-is-awesome]: Happily, the design of UTF-8 is such that this
|
[^utf8-is-awesome]: Happily, the design of UTF-8 is such that this
|
||||||
gives the same result as a lexicographic byte-by-byte comparison
|
gives the same result as a lexicographic byte-by-byte comparison
|
||||||
|
@ -176,7 +183,7 @@ A `Float` is a single-precision IEEE 754 floating-point value; a
|
||||||
`Double` is a double-precision IEEE 754 floating-point value.
|
`Double` is a double-precision IEEE 754 floating-point value.
|
||||||
`Float`s, `Double`s and `SignedInteger`s are considered disjoint, and
|
`Float`s, `Double`s and `SignedInteger`s are considered disjoint, and
|
||||||
so by the rules [above](#total-order), every `Float` is less than
|
so by the rules [above](#total-order), every `Float` is less than
|
||||||
every `Double`, and every `SignedInteger` is less than both. Two
|
every `Double`, and every `SignedInteger` is greater than both. Two
|
||||||
`Float`s or two `Double`s are to be ordered by the `totalOrder`
|
`Float`s or two `Double`s are to be ordered by the `totalOrder`
|
||||||
predicate defined in section 5.10 of
|
predicate defined in section 5.10 of
|
||||||
[IEEE Std 754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935).
|
[IEEE Std 754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935).
|
||||||
|
@ -196,10 +203,8 @@ record's *fields*. A record's label is, itself, a `Value`, though it
|
||||||
will usually be a `Symbol`.[^extensibility] [^iri-labels] `Record`s
|
will usually be a `Symbol`.[^extensibility] [^iri-labels] `Record`s
|
||||||
are compared lexicographically as if they were just tuples; that is,
|
are compared lexicographically as if they were just tuples; that is,
|
||||||
first by their labels, and then by the remainder of their fields. We
|
first by their labels, and then by the remainder of their fields. We
|
||||||
will only write examples of `Record`s having labels that are `Symbol`s
|
will write examples of `Record`s as a parenthesised, space-separated
|
||||||
entirely composed of ASCII characters. Such `Record`s will be written
|
sequence of their label `Value` followed by their field `Value`s.
|
||||||
as a parenthesised, space-separated sequence of their label followed
|
|
||||||
by their fields.
|
|
||||||
|
|
||||||
[^extensibility]: The [Racket](https://racket-lang.org/) programming
|
[^extensibility]: The [Racket](https://racket-lang.org/) programming
|
||||||
language defines
|
language defines
|
||||||
|
@ -215,19 +220,19 @@ by their fields.
|
||||||
`urn:uuid:6bf094a6-20f1-4887-ada7-46834a9b5b34`; where a label can
|
`urn:uuid:6bf094a6-20f1-4887-ada7-46834a9b5b34`; where a label can
|
||||||
be read as an absolute IRI, it stands for that IRI; and otherwise,
|
be read as an absolute IRI, it stands for that IRI; and otherwise,
|
||||||
it cannot be read as an IRI at all, and so the label simply stands
|
it cannot be read as an IRI at all, and so the label simply stands
|
||||||
for itself - for its own `Value`.
|
for itself—for its own `Value`.
|
||||||
|
|
||||||
**Examples.** The `Record` with label `foo` and fields 1, 2 and 3 is
|
**Examples.** The `Record` with label `foo` and fields 1, 2 and 3 is
|
||||||
written `(foo 1 2 3)`; the `Record` with label `void` and no fields is
|
written `(foo 1 2 3)`; the `Record` with label `void` and no fields is
|
||||||
written `(void)`.
|
written `(void)`.
|
||||||
|
|
||||||
|
**Non-examples.** `()`, because it lacks a label.
|
||||||
|
|
||||||
### Sequences.
|
### Sequences.
|
||||||
|
|
||||||
A `Sequence` is a general-purpose, variable-length ordered sequence of
|
A `Sequence` is a general-purpose, variable-length ordered sequence of
|
||||||
zero or more `Value`s. `Sequence`s are compared lexicographically,
|
zero or more `Value`s. `Sequence`s are compared lexicographically. We
|
||||||
appealing to the ordering on `Value`s for comparisons at each position
|
write examples space-separated, surrounded with square brackets.
|
||||||
in the `Sequence`s. We write examples space-separated, surrounded with
|
|
||||||
square brackets.
|
|
||||||
|
|
||||||
**Examples.** `[]`, the empty sequence; `[1 2 3]`, the sequence of
|
**Examples.** `[]`, the empty sequence; `[1 2 3]`, the sequence of
|
||||||
`SignedInteger`s 1, 2 and 3.
|
`SignedInteger`s 1, 2 and 3.
|
||||||
|
@ -237,18 +242,18 @@ square brackets.
|
||||||
A `Set` is an unordered finite set of `Value`s. It contains no
|
A `Set` is an unordered finite set of `Value`s. It contains no
|
||||||
duplicate values, following the [equivalence relation](#equivalence)
|
duplicate values, following the [equivalence relation](#equivalence)
|
||||||
induced by the total order on `Value`s. Two `Set`s are compared by
|
induced by the total order on `Value`s. Two `Set`s are compared by
|
||||||
sorting their elements using the [total order](#total-order) and
|
sorting their elements ascending using the [total order](#total-order)
|
||||||
comparing the resulting sequences as `Sequence`s. We write examples
|
and comparing the resulting `Sequence`s. We write examples
|
||||||
space-separated, surrounded with curly braces, prefixed by `#set`.
|
space-separated, surrounded with curly braces, prefixed by `#set`.
|
||||||
|
|
||||||
**Examples.** `#set{}`, the empty set; `#set{#set{}}`, the set
|
**Examples.** `#set{}`, the empty set; `#set{#set{}}`, the set
|
||||||
containing only the empty set; `#set{4 "hello" (void) 9.0f}`, the set
|
containing only the empty set; `#set{4 "hello" (void) 9.0f}`, the set
|
||||||
containing 4, the string `"hello"`, the record with label `void` and
|
containing 4, the string `"hello"`, the record with label `void` and
|
||||||
no fields, and the `Float` denoting the number 9.0; `#set{1 1.0f}`,
|
no fields, and the `Float` denoting the number 9.0; `#set{1 1.0f}`,
|
||||||
the set containing a `SignedInteger` and a `Float`, both denoting the
|
the set containing a `SignedInteger` and a `Float`; `#set{(mime
|
||||||
number 1; `#set{(mime application/xml #"<x/>") (mime
|
application/xml #"<x/>") (mime application/xml #"<x />")}`, a set
|
||||||
application/xml #"<x />")}`, a set containing two different
|
containing two different type-labelled byte
|
||||||
type-labelled byte arrays.[^mime-xml-difference]
|
arrays.[^mime-xml-difference]
|
||||||
|
|
||||||
[^mime-xml-difference]: The two XML documents `<x/>` and `<x />`
|
[^mime-xml-difference]: The two XML documents `<x/>` and `<x />`
|
||||||
differ by bytewise comparison, and thus yield different record
|
differ by bytewise comparison, and thus yield different record
|
||||||
|
@ -258,50 +263,31 @@ type-labelled byte arrays.[^mime-xml-difference]
|
||||||
**Non-examples.** `#set{1 1 1}`, because it contains multiple
|
**Non-examples.** `#set{1 1 1}`, because it contains multiple
|
||||||
equivalent `Value`s.
|
equivalent `Value`s.
|
||||||
|
|
||||||
### Dictionaries, hash-tables or maps.
|
### Dictionaries.
|
||||||
|
|
||||||
A `Dictionary` is an unordered finite collection of zero or more pairs
|
A `Dictionary` is an unordered finite collection of pairs of `Value`s.
|
||||||
of `Value`s. Each pair comprises a *key* and a *value*. Keys in a
|
Each pair comprises a *key* and a *value*. Keys in a `Dictionary` must
|
||||||
`Dictionary` must be pairwise distinct. Instances of `Dictionary` are
|
be pairwise distinct. Instances of `Dictionary` are compared by
|
||||||
compared by lexicographic comparison of the sequences resulting from
|
lexicographic comparison of the sequences resulting from ordering each
|
||||||
ordering each `Dictionary`'s pairs in ascending order by key. Examples
|
`Dictionary`'s pairs in ascending order by key. Examples are written
|
||||||
are written as a `#dict`-prefixed, curly-brace-surrounded sequence of
|
as a `#dict`-prefixed, curly-brace-surrounded sequence of
|
||||||
space-separated key-value pairs, each written with a colon between the
|
space-separated key-value pairs, each written with a colon between the
|
||||||
key and value.
|
key and value.
|
||||||
|
|
||||||
**Examples.** `#dict{}`, the empty dictionary; `#dict{a:1}`, the
|
**Examples.** `#dict{}`, the empty dictionary; `#dict{a:1}`, the
|
||||||
dictionary mapping the `Symbol` `a` to the `SignedInteger` 1;
|
dictionary mapping the `Symbol` `a` to the `SignedInteger` 1;
|
||||||
`#dict{1:a}`, mapping 1 to `a`; `#dict{"hi":0 hi:0 there:[]}`, having
|
`#dict{[1 2 3]:a}`, mapping `[1 2 3]` to `a`; `#dict{"hi":0 hi:0
|
||||||
a `String` and two `Symbol` keys, and `SignedInteger` and `Sequence`
|
there:[]}`, having a `String` and two `Symbol` keys, and
|
||||||
values.
|
`SignedInteger` and `Sequence` values.
|
||||||
|
|
||||||
**Non-examples.** `#dict{a:1 b:2 a:3}`, because it contains duplicate
|
**Non-examples.** `#dict{a:1 b:2 a:3}`, because it contains duplicate
|
||||||
keys; `#dict{[]:[] []:99}`, for the same reason.
|
keys; `#dict{[7 8]:[] [7 8]:99}`, for the same reason.
|
||||||
|
|
||||||
## Syntax
|
## Syntax
|
||||||
|
|
||||||
Now we have discussed `Value`s and their meanings, we may turn to
|
Now we have discussed `Value`s and their meanings, we may turn to
|
||||||
techniques for *representing* `Value`s for communication or storage.
|
techniques for *representing* `Value`s for communication or storage.
|
||||||
|
|
||||||
The syntax we have used for the examples so far is inadequate in many
|
|
||||||
ways, not least of which is that it cannot represent every `Value`.
|
|
||||||
|
|
||||||
Separation of the meaning of a piece of syntax from the syntax itself
|
|
||||||
opens the door to domain-specific syntaxes, all equivalent and
|
|
||||||
interconvertible.[^asn1] With a robust semantic foundation,
|
|
||||||
connections to other data languages can also be made.
|
|
||||||
|
|
||||||
[^asn1]: Those who remember
|
|
||||||
[ASN.1](https://www.itu.int/en/ITU-T/asn1/Pages/introduction.aspx)
|
|
||||||
will recall BER, DER, PER, CER, XER and so on, each appropriate to
|
|
||||||
a different setting. Similarly,
|
|
||||||
[Rivest's S-Expression design][sexp.txt] offers a human-friendly
|
|
||||||
syntax, a syntax robust to network-induced message corruption, and
|
|
||||||
an unambiguous, simple and easily-parsed machine-friendly syntax
|
|
||||||
for the same underlying values.
|
|
||||||
|
|
||||||
### Binary syntax
|
|
||||||
|
|
||||||
For now, we limit our attention to an easily-parsed, easily-produced
|
For now, we limit our attention to an easily-parsed, easily-produced
|
||||||
machine-readable syntax.
|
machine-readable syntax.
|
||||||
|
|
||||||
|
@ -312,42 +298,7 @@ encoded details of the `Value` itself.
|
||||||
|
|
||||||
For a value `v`, we write `[[v]]` for the `Repr` of v.
|
For a value `v`, we write `[[v]]` for the `Repr` of v.
|
||||||
|
|
||||||
The following figure summarises the definitions below:
|
### Type and Length representation
|
||||||
|
|
||||||
tt nn mmmm varint(m) contents
|
|
||||||
-------------------------------
|
|
||||||
|
|
||||||
00 00 0000 False
|
|
||||||
00 00 0001 True
|
|
||||||
00 00 0010 Float, 32 bits big-endian binary
|
|
||||||
00 00 0011 Double, 64 bits big-endian binary
|
|
||||||
00 00 x1xx RESERVED
|
|
||||||
00 00 1xxx RESERVED
|
|
||||||
00 01 xxxx RESERVED
|
|
||||||
00 10 ttnn Start Stream <tt,nn>
|
|
||||||
When tt = 00 --> error
|
|
||||||
01 --> each chunk is a <tt,nn> piece
|
|
||||||
1x --> each chunk is a single encoded Value
|
|
||||||
00 11 ttnn End Stream <tt,nn> (must match preceding Start Stream)
|
|
||||||
|
|
||||||
01 00 mmmm ... SignedInteger, big-endian binary
|
|
||||||
01 01 mmmm ... String, UTF-8 binary
|
|
||||||
01 10 mmmm ... ByteString
|
|
||||||
01 11 mmmm ... Symbol, UTF-8 binary
|
|
||||||
|
|
||||||
10 00 mmmm ... application-specific Record
|
|
||||||
10 01 mmmm ... application-specific Record
|
|
||||||
10 10 mmmm ... application-specific Record
|
|
||||||
10 11 mmmm ... Record
|
|
||||||
|
|
||||||
11 00 mmmm ... Sequence
|
|
||||||
11 01 mmmm ... Set
|
|
||||||
11 10 mmmm ... Dictionary
|
|
||||||
11 11 xxxx RESERVED
|
|
||||||
|
|
||||||
If mmmm = 1111, varint(m) is present; otherwise, m is the length
|
|
||||||
|
|
||||||
#### Type and Length representation
|
|
||||||
|
|
||||||
Each `Repr` takes one of three possible forms:
|
Each `Repr` takes one of three possible forms:
|
||||||
|
|
||||||
|
@ -365,13 +316,13 @@ Each `Repr` takes one of three possible forms:
|
||||||
begins before the number of elements or bytes in the corresponding
|
begins before the number of elements or bytes in the corresponding
|
||||||
`Value` is known.
|
`Value` is known.
|
||||||
|
|
||||||
Applications may choose between formats (B) and (C) depending on their
|
Applications may choose between formats B and C depending on their
|
||||||
needs at serialization time.
|
needs at serialization time.
|
||||||
|
|
||||||
Every `Repr`, however, starts with a *lead byte* describing the
|
Every `Repr` starts with a *lead byte* describing the remainder of the
|
||||||
remainder of the representation.
|
representation.
|
||||||
|
|
||||||
##### The lead byte
|
#### The lead byte
|
||||||
|
|
||||||
The lead byte is constructed by a function `leadbyte`:
|
The lead byte is constructed by a function `leadbyte`:
|
||||||
|
|
||||||
|
@ -387,18 +338,18 @@ follows:[^some-encodings-unused]
|
||||||
encodings are reserved for future versions of this specification.
|
encodings are reserved for future versions of this specification.
|
||||||
|
|
||||||
- `leadbyte(0,0,-)` (format A) represents an Atom with fixed-length binary representation.
|
- `leadbyte(0,0,-)` (format A) represents an Atom with fixed-length binary representation.
|
||||||
- `leadbyte(0,1,-)` (format A) is RESERVED.
|
- `leadbyte(0,1,-)` (format A) is reserved.
|
||||||
- `leadbyte(0,2,-)` (format C) is a Stream Start byte.
|
- `leadbyte(0,2,-)` (format C) is a Stream Start byte.
|
||||||
- `leadbyte(0,3,-)` (format C) is a Stream End byte.
|
- `leadbyte(0,3,-)` (format C) is a Stream End byte.
|
||||||
- `leadbyte(1,-,-)` (format B) represents an Atom with variable-length binary representation.
|
- `leadbyte(1,-,-)` (format B) represents an Atom with variable-length binary representation.
|
||||||
- `leadbyte(2,-,-)` (format B) represents a Record.
|
- `leadbyte(2,-,-)` (format B) represents a Record.
|
||||||
- `leadbyte(3,-,-)` (format B) represents a Sequence, Set or Dictionary.
|
- `leadbyte(3,-,-)` (format B) represents a Sequence, Set or Dictionary.
|
||||||
|
|
||||||
##### Encoding data of fixed length (format A)
|
#### Encoding data of fixed length (format A)
|
||||||
|
|
||||||
Each specific type of data defines its own rules for this format.
|
Each specific type of data defines its own rules for this format.
|
||||||
|
|
||||||
##### Encoding data of known length (format B)
|
#### Encoding data of known length (format B)
|
||||||
|
|
||||||
A `Repr` where the length of the `Value` to be encoded is variable but
|
A `Repr` where the length of the `Value` to be encoded is variable but
|
||||||
known uses the value of `m` in `leadbyte` to encode its length. The
|
known uses the value of `m` in `leadbyte` to encode its length. The
|
||||||
|
@ -434,15 +385,15 @@ definition,
|
||||||
- 300 (binary, grouped into 7-bit chunks, `10 0101100`) varint-encodes to the two bytes 172 and 2.
|
- 300 (binary, grouped into 7-bit chunks, `10 0101100`) varint-encodes to the two bytes 172 and 2.
|
||||||
- 1000000000 (binary `11 1011100 1101011 0010100 0000000`) varint-encodes to bytes 128, 148, 235, 220, and 3.
|
- 1000000000 (binary `11 1011100 1101011 0010100 0000000`) varint-encodes to bytes 128, 148, 235, 220, and 3.
|
||||||
|
|
||||||
##### Streaming data of unknown length (format C)
|
#### Streaming data of unknown length (format C)
|
||||||
|
|
||||||
A `Repr` where the length of the `Value` to be encoded is variable and
|
A `Repr` where the length of the `Value` to be encoded is variable and
|
||||||
not known at the time serialization of the `Value` starts is encoded
|
not known at the time serialization of the `Value` starts is encoded
|
||||||
by a single Stream Start byte, followed by zero or more *chunks*,
|
by a single Stream Start (“open”) byte, followed by zero or more
|
||||||
followed by a matching Stream End byte:
|
*chunks*, followed by a matching Stream End (“close”) byte:
|
||||||
|
|
||||||
startbyte(t,n) = leadbyte(0,2, t*4 + n)
|
open(t,n) = leadbyte(0,2, t*4 + n)
|
||||||
endbyte(t,n) = leadbyte(0,3, t*4 + n)
|
close(t,n) = leadbyte(0,3, t*4 + n)
|
||||||
|
|
||||||
For a `Repr` of a `Value` containing binary data, each chunk is to be
|
For a `Repr` of a `Value` containing binary data, each chunk is to be
|
||||||
a format B `Repr` of the same type as the overall `Repr`.
|
a format B `Repr` of the same type as the overall `Repr`.
|
||||||
|
@ -450,35 +401,34 @@ a format B `Repr` of the same type as the overall `Repr`.
|
||||||
For a `Repr` of a `Value` containing other `Value`s, each chunk is to
|
For a `Repr` of a `Value` containing other `Value`s, each chunk is to
|
||||||
be a single `Repr`.
|
be a single `Repr`.
|
||||||
|
|
||||||
#### Records
|
### Records
|
||||||
|
|
||||||
Format B (known length):
|
Format B (known length):
|
||||||
|
|
||||||
[[ (L F_1 ... F_m) ]] = header(2,3,m+1) ++ [[L]] ++ [[F_1]] ++ ... ++ [[F_m]]
|
[[ (L F_1...F_m) ]] = header(2,3,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]]
|
||||||
|
|
||||||
For `m` fields, `m+1` is supplied to `header`, to account for the
|
For `m` fields, `m+1` is supplied to `header`, to account for the
|
||||||
encoding of the record label.
|
encoding of the record label.
|
||||||
|
|
||||||
Format C (streaming):
|
Format C (streaming):
|
||||||
|
|
||||||
[[ (L F_1 ... F_m) ]]
|
[[ (L F_1...F_m) ]] = open(2,3) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close(2,3)
|
||||||
= startbyte(2,3) ++ [[L]] ++ [[F_1]] ++ ... ++ [[F_m]] ++ endbyte(2,3)
|
|
||||||
|
|
||||||
Applications *SHOULD* prefer the known-length format for encoding
|
Applications *SHOULD* prefer the known-length format for encoding
|
||||||
`Record`s.
|
`Record`s.
|
||||||
|
|
||||||
##### Application-specific short form for labels
|
#### Application-specific short form for labels
|
||||||
|
|
||||||
Any given protocol using Preserves may additionally define an
|
Any given protocol using Preserves may additionally define an
|
||||||
interpretation for `n ∈ {0,1,2}`, mapping each *short form label
|
interpretation for `n ∈ {0,1,2}`, mapping each *short form label
|
||||||
number* `n` to a specific record label. When encoding `m` fields with
|
number* `n` to a specific record label. When encoding `m` fields with
|
||||||
short form label number `n`, format B becomes
|
short form label number `n`, format B becomes
|
||||||
|
|
||||||
header(2,n,m) ++ [[F_1]] ++ ... ++ [[F_m]]
|
header(2,n,m) ++ [[F_1]] ++...++ [[F_m]]
|
||||||
|
|
||||||
and format C becomes
|
and format C becomes
|
||||||
|
|
||||||
startbyte(2,n) ++ [[F_1]] ++ ... ++ [[F_m]] ++ endbyte(2,n)
|
open(2,n) ++ [[F_1]] ++...++ [[F_m]] ++ close(2,n)
|
||||||
|
|
||||||
**Examples.** For example, a protocol may choose to map records
|
**Examples.** For example, a protocol may choose to map records
|
||||||
labelled `void` to `n=0`, making
|
labelled `void` to `n=0`, making
|
||||||
|
@ -494,30 +444,29 @@ making
|
||||||
|
|
||||||
for format B, or
|
for format B, or
|
||||||
|
|
||||||
= startbyte(2,1) ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]] ++ endbyte(2,1)
|
= open(2,1) ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]] ++ close(2,1)
|
||||||
= [0x29] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]] ++ [0x39]
|
= [0x29] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]] ++ [0x39]
|
||||||
|
|
||||||
for format C.
|
for format C.
|
||||||
|
|
||||||
#### Sequences, Sets and Dictionaries
|
### Sequences, Sets and Dictionaries
|
||||||
|
|
||||||
Format B (known length):
|
Format B (known length):
|
||||||
|
|
||||||
[[ [X_1 ... X_m] ]] = header(3,0,m) ++ [[X_1]] ++ ... ++ [[X_m]]
|
[[ [X_1...X_m] ]] = header(3,0,m) ++ [[X_1]] ++...++ [[X_m]]
|
||||||
|
[[ #set{X_1...X_m} ]] = header(3,1,m) ++ [[X_1]] ++...++ [[X_m]]
|
||||||
|
[[ #dict{K_1:V_1...K_m:V_m} ]] = header(3,2,m*2) ++ [[K_1]] ++ [[V_1]] ++...
|
||||||
|
++ [[K_m]] ++ [[V_m]]
|
||||||
|
|
||||||
[[ #set{X_1 ... X_m} ]] = header(3,1,m) ++ [[X_1]] ++ ... ++ [[X_m]]
|
Note that `m*2` is given to `header` for a `Dictionary`, since there
|
||||||
|
are two `Value`s in each key-value pair.
|
||||||
[[ #dict{K_1:V_1 ... K_m:V_m} ]]
|
|
||||||
= header(3,2,m) ++ [[K_1]] ++ [[V_1]] ++ ... ++ [[K_m]] ++ [[V_m]]
|
|
||||||
|
|
||||||
Format C (streaming):
|
Format C (streaming):
|
||||||
|
|
||||||
[[ [X_1 ... X_m] ]] = startbyte(3,0) ++ [[X_1]] ++ ... ++ [[X_m]] ++ endbyte(3,0)
|
[[ [X_1...X_m] ]] = open(3,0) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,0)
|
||||||
|
[[ #set{X_1...X_m} ]] = open(3,1) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,1)
|
||||||
[[ #set{X_1 ... X_m} ]] = startbyte(3,1) ++ [[X_1]] ++ ... ++ [[X_m]] ++ endbyte(3,1)
|
[[ #dict{K_1:V_1...K_m:V_m} ]] = open(3,2) ++ [[K_1]] ++ [[V_1]] ++...
|
||||||
|
++ [[K_m]] ++ [[V_m]] ++ close(3,2)
|
||||||
[[ #dict{K_1:V_1 ... K_m:V_m} ]]
|
|
||||||
= startbyte(3,2) ++ [[K_1]] ++ [[V_1]] ++ ... ++ [[K_m]] ++ [[V_m]] ++ endbyte(3,2)
|
|
||||||
|
|
||||||
Applications may use whichever format suits their needs on a
|
Applications may use whichever format suits their needs on a
|
||||||
case-by-case basis.
|
case-by-case basis.
|
||||||
|
@ -538,26 +487,30 @@ order.
|
||||||
(b) sorting keys or elements makes no sense in streaming
|
(b) sorting keys or elements makes no sense in streaming
|
||||||
serialization formats.
|
serialization formats.
|
||||||
|
|
||||||
Note that `header(3,3,m)` and `startbyte(3,3)`/`endbyte(3,3)` is unused and reserved.
|
However, a quality implementation may wish to offer the programmer
|
||||||
|
the option of serializing with set elements and dictionary keys in
|
||||||
|
sorted order.
|
||||||
|
|
||||||
#### Variable-length Atoms
|
Note that `header(3,3,m)` and `open(3,3)`/`close(3,3)` is unused and reserved.
|
||||||
|
|
||||||
##### SignedInteger
|
### Variable-length Atoms
|
||||||
|
|
||||||
|
#### SignedInteger
|
||||||
|
|
||||||
Format B (known length):
|
Format B (known length):
|
||||||
|
|
||||||
[[ x ]] when x ∈ SignedInteger = header(1,0,m) ++ intbytes(x)
|
[[ x ]] when x ∈ SignedInteger = header(1,0,m) ++ intbytes(x)
|
||||||
where m = |intbytes(x)|
|
|
||||||
and intbytes(x) = a big-endian two's-complement representation
|
|
||||||
of the signed integer x, taking exactly as
|
|
||||||
many whole bytes as needed to unambiguously
|
|
||||||
identify the value
|
|
||||||
|
|
||||||
Format C *MUST NOT* be used for `SignedInteger`s.
|
Format C *MUST NOT* be used for `SignedInteger`s.
|
||||||
|
|
||||||
|
The function `intbytes(x)` gives the big-endian two's-complement
|
||||||
|
binary representation of `x`, taking exactly as many whole bytes as
|
||||||
|
needed to unambiguously identify the value and its sign, and `m =
|
||||||
|
|intbytes(x)|`.
|
||||||
|
|
||||||
The value 0 needs zero bytes to identify the value, so `intbytes(0)`
|
The value 0 needs zero bytes to identify the value, so `intbytes(0)`
|
||||||
is the empty byte string. Non-zero values need at least one byte; the
|
is the empty byte string. Non-zero values need at least one byte; the
|
||||||
most-significant bit in the first byte in `intbytes(x)` for `x≠0` is
|
most-significant bit in the first byte in `intbytes(x)` for `x`≠0 is
|
||||||
the sign bit.
|
the sign bit.
|
||||||
|
|
||||||
For example,
|
For example,
|
||||||
|
@ -583,59 +536,49 @@ For example,
|
||||||
[[ 65536 ]] = [0x43, 0x01, 0x00, 0x00]
|
[[ 65536 ]] = [0x43, 0x01, 0x00, 0x00]
|
||||||
[[ 131072 ]] = [0x43, 0x02, 0x00, 0x00]
|
[[ 131072 ]] = [0x43, 0x02, 0x00, 0x00]
|
||||||
|
|
||||||
##### String
|
#### String, ByteString and Symbol
|
||||||
|
|
||||||
|
Syntax for these three types varies only in the value of `n` supplied
|
||||||
|
to `header`, `open`, and `close`. In each case, the payload following
|
||||||
|
the header is a binary sequence; for `String` and `Symbol`, it is a
|
||||||
|
UTF-8 encoding of the `Value`'s code points, while for `ByteString` it
|
||||||
|
is the raw data contained within the `Value` unmodified.
|
||||||
|
|
||||||
Format B (known length):
|
Format B (known length):
|
||||||
|
|
||||||
[[ S ]] when S ∈ String = header(1,1,m) ++ utf8(S)
|
[[ S ]] = header(1,n,m) ++ encode(S)
|
||||||
where m = |utf8(x)|
|
where m = |encode(S)|
|
||||||
and utf8(x) = the UTF-8 encoding of S
|
and (n,encode(S)) = (1,utf8(S)) if S ∈ String
|
||||||
|
(2,S) if S ∈ ByteString
|
||||||
|
(3,utf8(S)) if S ∈ Symbol
|
||||||
|
|
||||||
To stream a `String`, emit `startbyte(1,1)` and then a sequence of
|
To stream a `String`, `ByteString` or `Symbol`, emit `open(1,n)` and
|
||||||
zero or more format B `String` chunks, followed by `endbyte(1,1)`.
|
then a sequence of zero or more format B chunks, followed by
|
||||||
|
`close(1,n)`. For a `String`, every chunk must be a `String`;
|
||||||
|
likewise, for `ByteString` and `Symbol`.
|
||||||
|
|
||||||
While the overall content of a streamed `String` must be valid UTF-8,
|
While the overall content of a streamed `String` or `Symbol` must be
|
||||||
individual chunks do not have to conform to UTF-8.
|
valid UTF-8, individual chunks do not have to conform to UTF-8.
|
||||||
|
|
||||||
##### ByteString
|
### Fixed-length Atoms
|
||||||
|
|
||||||
Format B (known length):
|
|
||||||
|
|
||||||
[[ B ]] when B ∈ ByteString = header(1,2,m) ++ B
|
|
||||||
where m = |B|
|
|
||||||
|
|
||||||
To stream a `ByteString`, emit `startbyte(1,2)` and then a sequence of
|
|
||||||
zero or more format B `ByteString` chunks, followed by `endbyte(1,2)`.
|
|
||||||
|
|
||||||
##### Symbol
|
|
||||||
|
|
||||||
Format B (known length):
|
|
||||||
|
|
||||||
[[ S ]] when S ∈ Symbol = header(1,3,m) ++ utf8(S)
|
|
||||||
where m = |utf8(x)|
|
|
||||||
and utf8(x) = the UTF-8 encoding of S
|
|
||||||
|
|
||||||
To stream a `Symbol`, emit `startbyte(1,3)` and then a sequence of
|
|
||||||
zero or more format B `Symbol` chunks, followed by `endbyte(1,3)`.
|
|
||||||
|
|
||||||
#### Fixed-length Atoms
|
|
||||||
|
|
||||||
Fixed-length atoms all use format A, and do not have a length
|
Fixed-length atoms all use format A, and do not have a length
|
||||||
representation. They repurpose the bits that format B `Repr`s use to
|
representation. They repurpose the bits that format B `Repr`s use to
|
||||||
specify lengths. Applications *MUST NOT* use format C with
|
specify lengths. Applications *MUST NOT* use format C with
|
||||||
`startbyte(0,n)` or `endbyte(0,n)` for any `n`.
|
`open(0,n)` or `close(0,n)` for any `n`.
|
||||||
|
|
||||||
##### Booleans
|
#### Booleans
|
||||||
|
|
||||||
[[ #f ]] = header(0,0,0) = [0x00]
|
[[ #f ]] = header(0,0,0) = [0x00]
|
||||||
[[ #t ]] = header(0,0,1) = [0x01]
|
[[ #t ]] = header(0,0,1) = [0x01]
|
||||||
|
|
||||||
##### Floats and Doubles
|
#### Floats and Doubles
|
||||||
|
|
||||||
[[ F ]] when F ∈ Float = header(0,0,2) ++ binary32(F)
|
[[ F ]] when F ∈ Float = header(0,0,2) ++ binary32(F)
|
||||||
[[ D ]] when D ∈ Double = header(0,0,3) ++ binary64(D)
|
[[ D ]] when D ∈ Double = header(0,0,3) ++ binary64(D)
|
||||||
where binary32(F) and binary64(D) are big-endian 4- and 8-byte
|
|
||||||
IEEE 754 binary representations
|
The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
|
||||||
|
8-byte IEEE 754 binary representations of `F` and `D`, respectively.
|
||||||
|
|
||||||
## Examples
|
## Examples
|
||||||
|
|
||||||
|
@ -705,14 +648,33 @@ encodes to
|
||||||
|
|
||||||
The `Value` data type is essentially an S-Expression, able to
|
The `Value` data type is essentially an S-Expression, able to
|
||||||
represent semi-structured data over `ByteString`, `String`,
|
represent semi-structured data over `ByteString`, `String`,
|
||||||
`SignedInteger` atoms and so on.
|
`SignedInteger` atoms and so on.[^why-not-spki-sexps]
|
||||||
|
|
||||||
|
[^why-not-spki-sexps]: Rivest's S-Expressions are in many ways
|
||||||
|
similar to Preserves. However, while they include binary data and
|
||||||
|
sequences, and an obvious equivalence for them exists, they lack
|
||||||
|
numbers *per se* as well as any kind of unordered structure such
|
||||||
|
as sets or maps. In addition, while "display hints" allow
|
||||||
|
labelling of binary data with an intended interpretation, they
|
||||||
|
cannot be attached to any other kind of structure, and the "hint"
|
||||||
|
itself can only be a binary blob.
|
||||||
|
|
||||||
However, users need a wide variety of data types for representing
|
However, users need a wide variety of data types for representing
|
||||||
domain-specific values such as various kinds of encoded and normalized
|
domain-specific values such as various kinds of encoded and normalized
|
||||||
text, calendrical values, machine words, and so on.
|
text, calendrical values, machine words, and so on.
|
||||||
|
|
||||||
We use appropriately-labelled `Record`s to denote these
|
Appropriately-labelled `Record`s denote these domain-specific data
|
||||||
domain-specific data types.
|
types.[^why-dictionaries]
|
||||||
|
|
||||||
|
[^why-dictionaries]: Given `Record`'s existence, it may seem odd
|
||||||
|
that `Dictionary`, `Set`, `Float`, etc. are given special
|
||||||
|
treatment. Preserves aims to offer a useful basic equivalence
|
||||||
|
predicate to programmers, and so if a data type demands a special
|
||||||
|
equivalence predicate, as `Dictionary`, `Set` and `Float` all do,
|
||||||
|
then the type should be included in the base language. Otherwise,
|
||||||
|
it can be represented as a `Record` and treated separately. Both
|
||||||
|
`Boolean` and `String` are seeming exceptions: they merit
|
||||||
|
inclusion because of their cultural importance.
|
||||||
|
|
||||||
All of these conventions are optional. They form a layer atop the core
|
All of these conventions are optional. They form a layer atop the core
|
||||||
`Value` structure. Non-domain-specific tools do not in general need to
|
`Value` structure. Non-domain-specific tools do not in general need to
|
||||||
|
@ -740,11 +702,13 @@ being a `ByteString`, the binary data.
|
||||||
|
|
||||||
While each media type may define its own rules for comparing
|
While each media type may define its own rules for comparing
|
||||||
documents, we define ordering among `MIMEData` *representations* of
|
documents, we define ordering among `MIMEData` *representations* of
|
||||||
such media types lexicographically over the (`Symbol`, `ByteString`)
|
such media types following the general rules for ordering of
|
||||||
pair.
|
`Record`s.
|
||||||
|
|
||||||
**Examples.**
|
**Examples.**
|
||||||
|
|
||||||
|
| Value | Encoded hexadecimal byte sequence |
|
||||||
|
|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
|
||||||
| `(mime application/octet-stream #"abcde")` | B3 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
|
| `(mime application/octet-stream #"abcde")` | B3 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
|
||||||
| `(mime text/plain #"ABC")` | B3 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 |
|
| `(mime text/plain #"ABC")` | B3 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 |
|
||||||
| `(mime application/xml #"<xhtml/>")` | B3 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E |
|
| `(mime application/xml #"<xhtml/>")` | B3 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E |
|
||||||
|
@ -813,7 +777,84 @@ Dates, times, moments, and timestamps can be represented with a
|
||||||
or `date-time` productions of
|
or `date-time` productions of
|
||||||
[section 5.6 of RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6).
|
[section 5.6 of RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6).
|
||||||
|
|
||||||
## Representing Values in Programming Languages
|
## Security Considerations
|
||||||
|
|
||||||
|
**Empty chunks.** Streamed (format C) `String`s, `ByteString`s and
|
||||||
|
`Symbol`s may include chunks of zero length. This opens up a
|
||||||
|
possibility for denial-of-service: an attacker may begin streaming a
|
||||||
|
string, sending an endless sequence of zero length chunks, appearing
|
||||||
|
to make progress but not actually doing so. Implementations may place
|
||||||
|
optional reasonable restrictions on the number of consecutive empty
|
||||||
|
chunks that may appear in a stream, and may even supply an optional
|
||||||
|
mode that rejects empty chunks entirely.
|
||||||
|
|
||||||
|
**Canonical form for cryptographic hashing and signing.** As
|
||||||
|
specified, the encoding rules for `Value`s do not force canonical
|
||||||
|
serializations for `Set` or `Dictionary` values. Two serializations of
|
||||||
|
the same `Value` may yield different binary `Repr`s.
|
||||||
|
|
||||||
|
## Appendix. Table of lead byte values
|
||||||
|
|
||||||
|
00 - False
|
||||||
|
01 - True
|
||||||
|
02 - Float
|
||||||
|
03 - Double
|
||||||
|
(0x) RESERVED 04-0F
|
||||||
|
(1x) RESERVED 10-1F
|
||||||
|
2x - Start Stream
|
||||||
|
3x - End Stream
|
||||||
|
|
||||||
|
4x - SignedInteger
|
||||||
|
5x - String
|
||||||
|
6x - ByteString
|
||||||
|
7x - Symbol
|
||||||
|
|
||||||
|
8x - short form Record label index 0
|
||||||
|
9x - short form Record label index 1
|
||||||
|
Ax - short form Record label index 2
|
||||||
|
Bx - Record
|
||||||
|
|
||||||
|
Cx - Sequence
|
||||||
|
Dx - Set
|
||||||
|
Ex - Dictionary
|
||||||
|
(Fx) RESERVED F0-FF
|
||||||
|
|
||||||
|
## Appendix. Bit fields within lead byte values
|
||||||
|
|
||||||
|
tt nn mmmm contents
|
||||||
|
---------- ---------
|
||||||
|
|
||||||
|
00 00 0000 False
|
||||||
|
00 00 0001 True
|
||||||
|
00 00 0010 Float, 32 bits big-endian binary
|
||||||
|
00 00 0011 Double, 64 bits big-endian binary
|
||||||
|
|
||||||
|
00 10 ttnn Start Stream <tt,nn>
|
||||||
|
When tt = 00 --> error
|
||||||
|
01 --> each chunk is a <tt,nn> piece
|
||||||
|
1x --> each chunk is a single encoded Value
|
||||||
|
00 11 ttnn End Stream <tt,nn> (must match preceding Start Stream)
|
||||||
|
|
||||||
|
01 00 mmmm SignedInteger, big-endian binary
|
||||||
|
01 01 mmmm String, UTF-8 binary
|
||||||
|
01 10 mmmm ByteString
|
||||||
|
01 11 mmmm Symbol, UTF-8 binary
|
||||||
|
|
||||||
|
10 00 mmmm application-specific Record
|
||||||
|
10 01 mmmm application-specific Record
|
||||||
|
10 10 mmmm application-specific Record
|
||||||
|
10 11 mmmm Record
|
||||||
|
|
||||||
|
11 00 mmmm Sequence
|
||||||
|
11 01 mmmm Set
|
||||||
|
11 10 mmmm Dictionary
|
||||||
|
|
||||||
|
If mmmm = 1111, a varint(m) follows, giving the length, before
|
||||||
|
the body; otherwise, m is the length of the body to follow.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Appendix. Representing Values in Programming Languages
|
||||||
|
|
||||||
We have given a definition of `Value` and its semantics, and proposed
|
We have given a definition of `Value` and its semantics, and proposed
|
||||||
a concrete syntax for communicating and storing `Value`s. We now turn
|
a concrete syntax for communicating and storing `Value`s. We now turn
|
||||||
|
@ -881,32 +922,6 @@ should both be identities.
|
||||||
- `Set` ↔ a `sets` set (is this unambiguous? Maybe a [map][erlang-map] from elements to `true`?)
|
- `Set` ↔ a `sets` set (is this unambiguous? Maybe a [map][erlang-map] from elements to `true`?)
|
||||||
- `Dictionary` ↔ a [map][erlang-map] (new in Erlang/OTP R17)
|
- `Dictionary` ↔ a [map][erlang-map] (new in Erlang/OTP R17)
|
||||||
|
|
||||||
## Appendix. Table of lead byte values
|
|
||||||
|
|
||||||
00 - False
|
|
||||||
01 - True
|
|
||||||
02 - Float
|
|
||||||
03 - Double
|
|
||||||
(0x) RESERVED 04-0F
|
|
||||||
(1x) RESERVED 10-1F
|
|
||||||
2x - Start Stream
|
|
||||||
3x - End Stream
|
|
||||||
|
|
||||||
4x - SignedInteger
|
|
||||||
5x - String
|
|
||||||
6x - ByteString
|
|
||||||
7x - Symbol
|
|
||||||
|
|
||||||
8x - short form Record label index 0
|
|
||||||
9x - short form Record label index 1
|
|
||||||
Ax - short form Record label index 2
|
|
||||||
Bx - Record
|
|
||||||
|
|
||||||
Cx - Sequence
|
|
||||||
Dx - Set
|
|
||||||
Ex - Dictionary
|
|
||||||
(Fx) RESERVED F0-FF
|
|
||||||
|
|
||||||
## Appendix. Why not Just Use JSON?
|
## Appendix. Why not Just Use JSON?
|
||||||
|
|
||||||
<!-- JSON lacks semantics: JSON syntax doesn't denote anything -->
|
<!-- JSON lacks semantics: JSON syntax doesn't denote anything -->
|
||||||
|
@ -1060,47 +1075,13 @@ JSON itself does not offer any guidance for which of these options to
|
||||||
choose. In many real cases on the web, poor choices have led to
|
choose. In many real cases on the web, poor choices have led to
|
||||||
encodings that are irrecoverably ambiguous.
|
encodings that are irrecoverably ambiguous.
|
||||||
|
|
||||||
---
|
|
||||||
---
|
|
||||||
|
|
||||||
# Open questions
|
# Open questions
|
||||||
|
|
||||||
Q. Should "symbols" instead be URIs? Relative, usually; relative to
|
Q. Should "symbols" instead be URIs? Relative, usually; relative to
|
||||||
what? Some domain-specific base URI?
|
what? Some domain-specific base URI?
|
||||||
|
|
||||||
Q. What about general rationals, subsuming integers and IEEE floats
|
|
||||||
(except NaN and the Infinities)?
|
|
||||||
|
|
||||||
Q. Should I map to SPKI SEXP or is that nonsense / for later?[^why-not-spki-sexps]
|
|
||||||
|
|
||||||
[^why-not-spki-sexps]: Why not just use Rivest's S-Expressions as
|
|
||||||
they are? While they include binary data and sequences, and an
|
|
||||||
obvious equivalence for them exists, they lack numbers *per se* as
|
|
||||||
well as any kind of unordered structure such as sets or maps. In
|
|
||||||
addition, while "display hints" allow labelling of binary data
|
|
||||||
with an intended interpretation, they cannot be attached to any
|
|
||||||
other kind of structure, and the "hint" itself can only be a
|
|
||||||
binary blob.
|
|
||||||
|
|
||||||
Q. Should `Symbol` be a special syntax for a `Record` with a `Symbol`
|
|
||||||
label (recursive!?) and a single `String` field?
|
|
||||||
|
|
||||||
Q. Should `String` be a special syntax for `(utf8 ByteString)`? Again,
|
|
||||||
recursiveness problems...?
|
|
||||||
|
|
||||||
Q. Should `Dictionary` be a special syntax for etc etc.? `Set`?
|
|
||||||
`Float`? `Double`?
|
|
||||||
|
|
||||||
--> Rule of thumb: if there's a special equivalence predicate for it,
|
|
||||||
it needs to be built-in syntax. Otherwise it can be a regular
|
|
||||||
record. (So: `Boolean` might not make the cut for special
|
|
||||||
treatment?? Likewise `String`...? Ugh those are psychologically
|
|
||||||
important perhaps)
|
|
||||||
|
|
||||||
Q. Are the language mappings reasonable? How about one for Python?
|
Q. Are the language mappings reasonable? How about one for Python?
|
||||||
|
|
||||||
---
|
Q. Literal small integers: could be nice? Not absolutely necessary.
|
||||||
|
|
||||||
Literal small integers: could be nice? Not absolutely necessary.
|
## Notes
|
||||||
|
|
||||||
---
|
|
||||||
|
|
Loading…
Reference in New Issue