WIP from the early hours of this morning, adding textual syntax
This commit is contained in:
parent
bef125e818
commit
732fbc7059
348
preserve.md
348
preserve.md
|
@ -6,12 +6,13 @@
|
|||
# Preserves: an Expressive Data Language
|
||||
|
||||
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
||||
September 2018. Version 0.0.2.
|
||||
September 2018. Version 0.0.3.
|
||||
|
||||
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
|
||||
[spki]: http://world.std.com/~cme/html/spki.html
|
||||
[varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
|
||||
[erlang-map]: http://erlang.org/doc/reference_manual/data_types.html#map
|
||||
[abnf]: https://tools.ietf.org/html/rfc7405
|
||||
|
||||
This document proposes a data model and serialization format called
|
||||
*Preserves*.
|
||||
|
@ -47,7 +48,8 @@ structures of any particular implementation language.
|
|||
|
||||
Taking inspiration from functional programming, we start with a
|
||||
definition of the *values* that we want to work with and give them
|
||||
meaning independent of their syntax. We will treat syntax separately,
|
||||
meaning independent of their syntax. When we write examples of values,
|
||||
we will do so using the [textual syntax](#textual-syntax) defined
|
||||
later in this document.
|
||||
|
||||
Our `Value`s fall into two broad categories: *atomic* and *compound*
|
||||
|
@ -94,8 +96,7 @@ neither is less than the other according to the total order.
|
|||
### Signed integers.
|
||||
|
||||
A `SignedInteger` is a signed integer of arbitrary width.
|
||||
`SignedInteger`s are compared as mathematical integers. We will write
|
||||
examples of `SignedInteger`s using standard mathematical notation.
|
||||
`SignedInteger`s are compared as mathematical integers.
|
||||
|
||||
**Examples.** 10; -6; 0.
|
||||
|
||||
|
@ -107,8 +108,7 @@ examples of `SignedInteger`s using standard mathematical notation.
|
|||
A `String` is a sequence of Unicode
|
||||
[code-point](http://www.unicode.org/glossary/#code_point)s. `String`s
|
||||
are compared lexicographically, code-point by
|
||||
code-point.[^utf8-is-awesome] We will write examples of `String`s as
|
||||
text surrounded by quotes “`"`”.
|
||||
code-point.[^utf8-is-awesome]
|
||||
|
||||
[^utf8-is-awesome]: Happily, the design of UTF-8 is such that this
|
||||
gives the same result as a lexicographic byte-by-byte comparison
|
||||
|
@ -121,33 +121,27 @@ the string containing the three Unicode code-points `z` (0x7A), `水`
|
|||
### Binary data.
|
||||
|
||||
A `ByteString` is an ordered sequence of zero or more eight-bit bytes.
|
||||
`ByteString`s are compared lexicographically. We will only write
|
||||
examples of `ByteString`s that contain bytes denoting printable ASCII
|
||||
characters, using “`#"`” as an open-quote and “`"`” as a close-quote
|
||||
mark.
|
||||
`ByteString`s are compared lexicographically.
|
||||
|
||||
**Examples.** The `ByteString` containing the integers 65, 66 and 67
|
||||
(corresponding to ASCII characters `A`, `B` and `C`) is written as
|
||||
`#"ABC"`. The empty `ByteString` is written as `#""`. **N.B.** Despite
|
||||
appearances, these are *binary* data.
|
||||
**Examples.** `#""`, the empty `ByteString`; `#"ABC"`, the
|
||||
`ByteString` containing the integers 65, 66 and 67 (corresponding to
|
||||
ASCII characters `A`, `B` and `C`). **N.B.** Despite appearances,
|
||||
these are *binary* data.
|
||||
|
||||
### Symbols.
|
||||
|
||||
Programming languages like Lisp and Prolog frequently use string-like
|
||||
values called *symbols*. Here, a `Symbol` is, like a `String`, a
|
||||
sequence of Unicode code-points representing an identifier of some
|
||||
kind. `Symbol`s are also compared lexicographically by code-point. We
|
||||
will write examples including only non-empty sequences of
|
||||
non-whitespace characters, using a monospace font without quotation
|
||||
marks.
|
||||
kind. `Symbol`s are also compared lexicographically by code-point.
|
||||
|
||||
**Examples.** `hello-world`; `utf8-string`; `exact-integer?`.
|
||||
|
||||
### Booleans.
|
||||
|
||||
There are exactly two `Boolean` values, “false” and “true”. The
|
||||
“false” value compares less-than the “true” value. We write `#f` for
|
||||
“false”, and `#t` for “true”.
|
||||
“false” value compares less-than the “true” value. We write `#false`
|
||||
for “false”, and `#true` for “true”.
|
||||
|
||||
### IEEE floating-point values.
|
||||
|
||||
|
@ -159,11 +153,11 @@ every `Double`, and every `SignedInteger` is greater than both. Two
|
|||
`Float`s or two `Double`s are to be ordered by the `totalOrder`
|
||||
predicate defined in section 5.10 of
|
||||
[IEEE Std 754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935).
|
||||
We write examples using standard mathematical notation, avoiding NaN
|
||||
and infinities, using a suffix `f` or `d` to indicate `Float` or
|
||||
`Double`, respectively.
|
||||
We write examples using a fractional part and/or an exponent to
|
||||
distinguish them from `SignedInteger`s. An additional suffix `f`
|
||||
distinguishes `Float`s from `Double`s.
|
||||
|
||||
**Examples.** 10f; -6d; 0f; 0.5d; -1.202e300d.
|
||||
**Examples.** 10.0f; -6.0; 0.0f; 0.5; -1.202e300.
|
||||
|
||||
**Non-examples.** 10, -6, and 0, because writing them this way
|
||||
indicates `SignedInteger`s, not `Float`s or `Double`s.
|
||||
|
@ -174,9 +168,7 @@ A `Record` is a *labelled* tuple of zero or more `Value`s, called the
|
|||
record's *fields*. A record's label is itself a `Value`, though it
|
||||
will usually be a `Symbol`.[^extensibility] [^iri-labels] `Record`s
|
||||
are compared lexicographically as if they were just tuples; that is,
|
||||
first by their labels, and then by the remainder of their fields. We
|
||||
will write examples of `Record`s as a parenthesised, space-separated
|
||||
sequence of their label `Value` followed by their field `Value`s.
|
||||
first by their labels, and then by the remainder of their fields.
|
||||
|
||||
[^extensibility]: The [Racket](https://racket-lang.org/) programming
|
||||
language defines
|
||||
|
@ -194,17 +186,16 @@ sequence of their label `Value` followed by their field `Value`s.
|
|||
it cannot be read as an IRI at all, and so the label simply stands
|
||||
for itself—for its own `Value`.
|
||||
|
||||
**Examples.** The `Record` with label `foo` and fields 1, 2 and 3 is
|
||||
written `(foo 1 2 3)`; the `Record` with label `void` and no fields is
|
||||
written `(void)`.
|
||||
**Examples.** `foo(1 2 3)`, a `Record` with label `foo` and fields 1,
|
||||
2 and 3; `void()`, a `Record` with label `void` and no fields.
|
||||
|
||||
**Non-examples.** `()`, because it lacks a label.
|
||||
**Non-examples.** `()`, because it lacks a label; `void`, because it
|
||||
lacks even an empty tuple of fields.
|
||||
|
||||
### Sequences.
|
||||
|
||||
A `Sequence` is a general-purpose, variable-length ordered sequence of
|
||||
zero or more `Value`s. `Sequence`s are compared lexicographically. We
|
||||
write examples space-separated, surrounded with square brackets.
|
||||
zero or more `Value`s. `Sequence`s are compared lexicographically.
|
||||
|
||||
**Examples.** `[]`, the empty sequence; `[1 2 3]`, the sequence of
|
||||
`SignedInteger`s 1, 2 and 3.
|
||||
|
@ -215,25 +206,24 @@ A `Set` is an unordered finite set of `Value`s. It contains no
|
|||
duplicate values, following the [equivalence relation](#equivalence)
|
||||
induced by the total order on `Value`s. Two `Set`s are compared by
|
||||
sorting their elements ascending using the [total order](#total-order)
|
||||
and comparing the resulting `Sequence`s. We write examples
|
||||
space-separated, surrounded with curly braces, prefixed by `#set`.
|
||||
and comparing the resulting `Sequence`s.
|
||||
|
||||
**Examples.** `#set{}`, the empty set; `#set{#set{}}`, the set
|
||||
containing only the empty set; `#set{4 "hello" (void) 9.0f}`, the set
|
||||
containing only the empty set; `{4 "hello" (void) 9.0f}`, the set
|
||||
containing 4, the string `"hello"`, the record with label `void` and
|
||||
no fields, and the `Float` denoting the number 9.0; `#set{1 1.0f}`,
|
||||
the set containing a `SignedInteger` and a `Float`; `#set{(mime
|
||||
application/xml #"<x/>") (mime application/xml #"<x />")}`, a set
|
||||
containing two different type-labelled byte
|
||||
arrays.[^mime-xml-difference]
|
||||
no fields, and the `Float` denoting the number 9.0; `{1 1.0f}`, the
|
||||
set containing a `SignedInteger` and a `Float`; `{mime(application/xml
|
||||
#"<x/>") mime(application/xml #"<x />")}`, a set containing two
|
||||
different `mime` records.[^mime-xml-difference]
|
||||
|
||||
[^mime-xml-difference]: The two XML documents `<x/>` and `<x />`
|
||||
differ by bytewise comparison, and thus yield different record
|
||||
values, even though under the semantics of XML they denote
|
||||
identical XML infoset.
|
||||
|
||||
**Non-examples.** `#set{1 1 1}`, because it contains multiple
|
||||
equivalent `Value`s.
|
||||
**Non-examples.** `{1 1}`, because it contains multiple equivalent
|
||||
`Value`s; `{}`, because without the `#set` marker, it denotes the
|
||||
empty dictionary.
|
||||
|
||||
### Dictionaries.
|
||||
|
||||
|
@ -241,27 +231,189 @@ A `Dictionary` is an unordered finite collection of pairs of `Value`s.
|
|||
Each pair comprises a *key* and a *value*. Keys in a `Dictionary` must
|
||||
be pairwise distinct. Instances of `Dictionary` are compared by
|
||||
lexicographic comparison of the sequences resulting from ordering each
|
||||
`Dictionary`'s pairs in ascending order by key. Examples are written
|
||||
as a `#dict`-prefixed, curly-brace-surrounded sequence of
|
||||
space-separated key-value pairs, each written with a colon between the
|
||||
key and value.
|
||||
`Dictionary`'s pairs in ascending order by key.
|
||||
|
||||
**Examples.** `#dict{}`, the empty dictionary; `#dict{a:1}`, the
|
||||
dictionary mapping the `Symbol` `a` to the `SignedInteger` 1;
|
||||
`#dict{[1 2 3]:a}`, mapping `[1 2 3]` to `a`; `#dict{"hi":0 hi:0
|
||||
there:[]}`, having a `String` and two `Symbol` keys, and
|
||||
`SignedInteger` and `Sequence` values.
|
||||
**Examples.** `{}`, the empty dictionary; `{a: 1}`, the dictionary
|
||||
mapping the `Symbol` `a` to the `SignedInteger` 1; `{[1 2 3]: a}`,
|
||||
mapping `[1 2 3]` to `a`; `{"hi": 0, hi: 0, there: []}`, having a
|
||||
`String` and two `Symbol` keys, and `SignedInteger` and `Sequence`
|
||||
values.
|
||||
|
||||
**Non-examples.** `#dict{a:1 b:2 a:3}`, because it contains duplicate
|
||||
keys; `#dict{[7 8]:[] [7 8]:99}`, for the same reason.
|
||||
**Non-examples.** `{a:1 b:2 a:3}`, because it contains duplicate
|
||||
keys; `{[7 8]:[] [7 8]:99}`, for the same reason.
|
||||
|
||||
## Syntax
|
||||
## Textual Syntax
|
||||
|
||||
Now we have discussed `Value`s and their meanings, we may turn to
|
||||
techniques for *representing* `Value`s for communication or storage.
|
||||
|
||||
For now, we limit our attention to an easily-parsed, easily-produced
|
||||
machine-readable syntax.
|
||||
In this section, we use [case-sensitive ABNF][abnf] to define a
|
||||
textual syntax that is easy for people to read and
|
||||
write.[^json-superset] Most of the examples in this document are
|
||||
written using this syntax. In the following section, we will define an
|
||||
equivalent compact machine-readable syntax.
|
||||
|
||||
[^json-superset]: The grammar of the textual syntax is a superset of
|
||||
JSON, with the slightly unusual feature that `true`, `false`, and
|
||||
`null` are all read as `Symbol`s, and that `SignedInteger`s are
|
||||
never read as `Double`s.
|
||||
|
||||
### Character set
|
||||
|
||||
[ABNF][abnf] allows easy definition of US-ASCII-based languages.
|
||||
However, Preserves is a Unicode-based language. Therefore, we
|
||||
reinterpret ABNF as a grammar for recognising sequences of Unicode
|
||||
code points.
|
||||
|
||||
Textual syntax for a `Value` *SHOULD* be encoded using UTF-8 where
|
||||
possible.
|
||||
|
||||
### Whitespace
|
||||
|
||||
Whitespace is defined as any number of spaces, tabs, carriage returns,
|
||||
line feeds, comments, or commas. A comment is a semicolon followed by
|
||||
the unicode code points up to and including the next carriage return
|
||||
or line feed.
|
||||
|
||||
ws = *(%x20 / %x09 / newline / comment / ",")
|
||||
newline = CR / LF
|
||||
comment = ";" *(WSP / nonnl) newline
|
||||
nonnl = <any Unicode code point except CR or LF>
|
||||
|
||||
### Grammar
|
||||
|
||||
Standalone documents containing textual representations of `Value`s may have trailing whitespace.
|
||||
|
||||
Document = Value ws
|
||||
|
||||
Any `Value` may be preceded by whitespace.
|
||||
|
||||
Value = ws (Record / Collection / Atom / Compact)
|
||||
Collection = Sequence / Dictionary / Set
|
||||
Atom = Boolean / Float / Double / SignedInteger /
|
||||
String / ByteString / Symbol
|
||||
|
||||
Each `Record` is its label-`Value` followed by a parenthesised
|
||||
grouping of its field-`Value`s.
|
||||
|
||||
Record = Value ws "(" *Value ws ")"
|
||||
|
||||
`Sequence`s are enclosed in square brackets. `Dictionary` values are
|
||||
curly-brace-enclosed colon-separated pairs of values. `Set`s are
|
||||
written either as a simple curly-brace-enclosed non-empty sequence of
|
||||
values, or as a possibly-empty sequence of values enclosed by the
|
||||
tokens `#set{` and `}`.
|
||||
|
||||
Sequence = "[" *Value ws "]"
|
||||
Dictionary = "{" *(Value ws ":" Value) ws "}"
|
||||
Set = %s"#set{" *Value ws "}" / "{" 1*Value ws "}"
|
||||
|
||||
Any `Value` may be represented using the
|
||||
[compact binary syntax](#compact-binary-syntax) by directly prefixing
|
||||
the binary form of the `Value` with ASCII `SOH` (`%x01`), or by
|
||||
enclosing a hexadecimal representation of the binary form of the
|
||||
`Value` in the tokens `#hexvalue{` and `}`.
|
||||
|
||||
Compact = %x01 <binary data> / %s"#hexvalue{" *(ws / HEXDIG) ws "}"
|
||||
|
||||
`Boolean`s are the simple literal strings `#true` and `#false`.
|
||||
|
||||
Boolean = %s"#true" / %s"#false"
|
||||
|
||||
Numeric data follow the
|
||||
[JSON grammar](https://tools.ietf.org/html/rfc8259#section-6), with
|
||||
the addition of a trailing "f" distinguishing `Float` from `Double`
|
||||
values. `Float`s and `Double`s always have either a fractional part or
|
||||
an exponent part, where `SignedInteger`s never have either.
|
||||
|
||||
TODO: talk about precise reading of floats, and the need for arbitrary
|
||||
precision. Your language will often have a good floating-point reading
|
||||
library.
|
||||
|
||||
Float = flt %i"f"
|
||||
Double = flt
|
||||
SignedInteger = int
|
||||
|
||||
digit1-9 = %x31-39
|
||||
nat = %x30 / ( digit1-9 *DIGIT )
|
||||
int = ["-"] nat
|
||||
frac = "." 1*DIGIT
|
||||
exp = %i"e" ["-"/"+"] 1*DIGIT
|
||||
flt = int (frac exp / frac / exp)
|
||||
|
||||
`String`s are,
|
||||
[as in JSON](https://tools.ietf.org/html/rfc8259#section-7), possibly
|
||||
escaped text surrounded by double quotes. The escaping rules are the
|
||||
same as for JSON.[^string-json-correspondence]
|
||||
|
||||
TODO: discuss surrogate pairs in \uXXXX form
|
||||
|
||||
String = %x22 *char %x22
|
||||
char = unescaped / %x7C / escape (escaped / %x22 / %s"u" 4HEXDIG)
|
||||
unescaped = %x20-21 / %x23-5B / %x5D-7B / %x7D-10FFFF
|
||||
escape = %x5C ; \
|
||||
escaped = ( %x5C / ; \ reverse solidus U+005C
|
||||
%x2F / ; / solidus U+002F
|
||||
%x62 / ; b backspace U+0008
|
||||
%x66 / ; f form feed U+000C
|
||||
%x6E / ; n line feed U+000A
|
||||
%x72 / ; r carriage return U+000D
|
||||
%x74 ) ; t tab U+0009
|
||||
|
||||
[^string-json-correspondence]: The grammar for `String` has the same
|
||||
effect as the
|
||||
[JSON](https://tools.ietf.org/html/rfc8259#section-7) grammar for
|
||||
`string`. Some auxiliary definitions (e.g. `escaped`) are lifted
|
||||
largely unmodified from the text of RFC 8259.
|
||||
|
||||
A `ByteString` may be written in any of three different forms.
|
||||
|
||||
The first is similar to a `String`, but prepended with a hash sign
|
||||
`#`. In addition, only Unicode code points overlapping with printable
|
||||
7-bit ASCII are permitted unescaped inside such a `ByteString`; other
|
||||
byte values must be escaped by prepending a two-digit hexadecimal
|
||||
value with `\x`.
|
||||
|
||||
ByteString = "#" %x22 *binchar %x22
|
||||
binchar = binunescaped / escape (escaped / %x22 / %s"x" 2HEXDIG)
|
||||
binunescaped = %x20-21 / %x23-5B / %x5D-7E
|
||||
|
||||
The second is as a sequence of pairs hexadecimal digits interleaved
|
||||
with whitespace and surrounded by `#hex{` and `}`.
|
||||
|
||||
ByteString =/ %s"#hex{" *(ws / 2HEXDIG) ws "}"
|
||||
|
||||
The third is as a sequence of
|
||||
[Base64](https://tools.ietf.org/html/rfc4648) characters, interleaved
|
||||
with whitespace and surrounded by `#base64{` and `}`. Plain and
|
||||
URL-safe Base64 characters are allowed.
|
||||
|
||||
ByteString =/ %s"#base64{" *(ws / base64char) ws "}" /
|
||||
base64char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "-" / "_" / "="
|
||||
|
||||
A `Symbol` may be written in a "bare" form,[^cf-sexp-token] so long as
|
||||
it conforms to certain restrictions on the characters appearing in the
|
||||
symbol, or in a quoted form. The quoted form is much the same as the
|
||||
syntax for `String`s, including embedded escape syntax, except using a
|
||||
bar or pipe character (`|`) instead of a double quote mark.
|
||||
|
||||
Symbol = symstart *symcont / "|" *symchar "|"
|
||||
symstart = ALPHA / sympunct
|
||||
symcont = ALPHA / sympunct / DIGIT / "-" / "."
|
||||
sympunct = "~" / "!" / "@" / "$" / "%" / "^" / "&" / "*" /
|
||||
"?" / "_" / "=" / "+" / "<" / ">" / "/"
|
||||
symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
|
||||
|
||||
[^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt]
|
||||
definition of "token representation".
|
||||
|
||||
TODO: More unicode in unescaped symbols?
|
||||
|
||||
### Printing
|
||||
|
||||
Recommend a JSON-compatible print mode. Recommend a submode with trailing commas.
|
||||
|
||||
## Compact Binary Syntax
|
||||
|
||||
A `Repr` is an encoding, or representation, of a specific `Value`.
|
||||
Each `Repr` comprises one or more bytes describing first the kind of
|
||||
|
@ -373,14 +525,14 @@ be a single `Repr`.
|
|||
|
||||
Format B (known length):
|
||||
|
||||
[[ (L F_1...F_m) ]] = header(2,3,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]]
|
||||
[[ L(F_1...F_m) ]] = header(2,3,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]]
|
||||
|
||||
For `m` fields, `m+1` is supplied to `header`, to account for the
|
||||
encoding of the record label.
|
||||
|
||||
Format C (streaming):
|
||||
|
||||
[[ (L F_1...F_m) ]] = open(2,3) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close(2,3)
|
||||
[[ L(F_1...F_m) ]] = open(2,3) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close(2,3)
|
||||
|
||||
Applications *SHOULD* prefer the known-length format for encoding
|
||||
`Record`s.
|
||||
|
@ -401,12 +553,12 @@ and format C becomes
|
|||
**Examples.** For example, a protocol may choose to map records
|
||||
labelled `void` to `n=0`, making
|
||||
|
||||
[[(void)]] = header(2,0,0) = [0x80]
|
||||
[[void()]] = header(2,0,0) = [0x80]
|
||||
|
||||
or it may map records labelled `person` to short form label number 1,
|
||||
making
|
||||
|
||||
[[(person "Dr" "Elizabeth" "Blackwell")]]
|
||||
[[person("Dr", "Elizabeth", "Blackwell")]]
|
||||
= header(2,1,3) ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
|
||||
= [0x93] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
|
||||
|
||||
|
@ -421,20 +573,20 @@ for format C.
|
|||
|
||||
Format B (known length):
|
||||
|
||||
[[ [X_1...X_m] ]] = header(3,0,m) ++ [[X_1]] ++...++ [[X_m]]
|
||||
[[ #set{X_1...X_m} ]] = header(3,1,m) ++ [[X_1]] ++...++ [[X_m]]
|
||||
[[ #dict{K_1:V_1...K_m:V_m} ]] = header(3,2,m*2) ++ [[K_1]] ++ [[V_1]] ++...
|
||||
++ [[K_m]] ++ [[V_m]]
|
||||
[[ [X_1...X_m] ]] = header(3,0,m) ++ [[X_1]] ++...++ [[X_m]]
|
||||
[[ #set{X_1...X_m} ]] = header(3,1,m) ++ [[X_1]] ++...++ [[X_m]]
|
||||
[[ {K_1:V_1...K_m:V_m} ]] = header(3,2,m*2) ++ [[K_1]] ++ [[V_1]] ++...
|
||||
++ [[K_m]] ++ [[V_m]]
|
||||
|
||||
Note that `m*2` is given to `header` for a `Dictionary`, since there
|
||||
are two `Value`s in each key-value pair.
|
||||
|
||||
Format C (streaming):
|
||||
|
||||
[[ [X_1...X_m] ]] = open(3,0) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,0)
|
||||
[[ #set{X_1...X_m} ]] = open(3,1) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,1)
|
||||
[[ #dict{K_1:V_1...K_m:V_m} ]] = open(3,2) ++ [[K_1]] ++ [[V_1]] ++...
|
||||
++ [[K_m]] ++ [[V_m]] ++ close(3,2)
|
||||
[[ [X_1...X_m] ]] = open(3,0) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,0)
|
||||
[[ #set{X_1...X_m} ]] = open(3,1) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,1)
|
||||
[[ {K_1:V_1...K_m:V_m} ]] = open(3,2) ++ [[K_1]] ++ [[V_1]] ++...
|
||||
++ [[K_m]] ++ [[V_m]] ++ close(3,2)
|
||||
|
||||
Applications may use whichever format suits their needs on a
|
||||
case-by-case basis.
|
||||
|
@ -528,8 +680,8 @@ specify lengths. Applications *MUST NOT* use format C with
|
|||
|
||||
#### Booleans
|
||||
|
||||
[[ #f ]] = header(0,0,0) = [0x00]
|
||||
[[ #t ]] = header(0,0,1) = [0x01]
|
||||
[[ #false ]] = header(0,0,0) = [0x00]
|
||||
[[ #true ]] = header(0,0,1) = [0x01]
|
||||
|
||||
#### Floats and Doubles
|
||||
|
||||
|
@ -550,31 +702,27 @@ short form label number 0 to label `discard`, 1 to `capture`, and 2 to
|
|||
|
||||
| Value | Encoded hexadecimal byte sequence |
|
||||
|---------------------------------------------------|----------------------------------------------------------------------|
|
||||
| `(capture (discard))` | 91 80 |
|
||||
| `(observe (speak (discard) (capture (discard))))` | A1 B3 75 73 70 65 61 6B 80 91 80 |
|
||||
| `capture(discard())` | 91 80 |
|
||||
| `observe(speak(discard(), capture(discard())))` | A1 B3 75 73 70 65 61 6B 80 91 80 |
|
||||
| `[1 2 3 4]` (format B) | C4 11 12 13 14 |
|
||||
| `[1 2 3 4]` (format C) | 2C 11 12 13 14 3C |
|
||||
| `[-2 -1 0 1]` | C4 1E 1F 10 11 |
|
||||
| `"hello"` (format B) | 55 68 65 6C 6C 6F |
|
||||
| `"hello"` (format C, 2 chunks) | 25 62 68 65 63 6C 6C 6F 35 |
|
||||
| `"hello"` (format C, 5 chunks) | 25 62 68 65 62 6C 6C 60 60 61 6F 35 |
|
||||
| `["hello" there #"world" [] #set{} #t #f]` | C7 55 68 65 6C 6C 6F 75 74 68 65 72 65 65 77 6F 72 6C 64 C0 D0 01 00 |
|
||||
| `["hello" there #"world" [] #set{} #true #false]` | C7 55 68 65 6C 6C 6F 75 74 68 65 72 65 65 77 6F 72 6C 64 C0 D0 01 00 |
|
||||
| `-257` | 42 FE FF |
|
||||
| `-1` | 1F |
|
||||
| `0` | 10 |
|
||||
| `1` | 11 |
|
||||
| `255` | 42 00 FF |
|
||||
| `1f` | 02 3F 80 00 00 |
|
||||
| `1d` | 03 3F F0 00 00 00 00 00 00 |
|
||||
| `-1.202e300d` | 03 FE 3C B7 B7 59 BF 04 26 |
|
||||
| `1.0f` | 02 3F 80 00 00 |
|
||||
| `1.0` | 03 3F F0 00 00 00 00 00 00 |
|
||||
| `-1.202e300` | 03 FE 3C B7 B7 59 BF 04 26 |
|
||||
|
||||
Finally, a larger example, using a non-`Symbol` label for a record.[^extensibility2] The `Record`
|
||||
|
||||
([titled person 2 thing 1]
|
||||
101
|
||||
"Blackwell"
|
||||
(date 1821 2 3)
|
||||
"Dr")
|
||||
[titled person 2 thing 1](101, "Blackwell", date(1821 2 3), "Dr")
|
||||
|
||||
encodes to
|
||||
|
||||
|
@ -671,16 +819,16 @@ such media types following the general rules for ordering of
|
|||
|
||||
| Value | Encoded hexadecimal byte sequence |
|
||||
|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
|
||||
| `(mime application/octet-stream #"abcde")` | B3 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
|
||||
| `(mime text/plain #"ABC")` | B3 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 |
|
||||
| `(mime application/xml #"<xhtml/>")` | B3 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E |
|
||||
| `(mime text/csv #"123,234,345")` | B3 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 |
|
||||
| `mime(application/octet-stream #"abcde")` | B3 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
|
||||
| `mime(text/plain #"ABC")` | B3 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 |
|
||||
| `mime(application/xml #"<xhtml/>")` | B3 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E |
|
||||
| `mime(text/csv #"123,234,345")` | B3 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 |
|
||||
|
||||
Applications making heavy use of `mime` records may choose to use a
|
||||
short form label number for the record type. For example, if short
|
||||
form label number 1 were chosen, the second example above, `(mime
|
||||
text/plain "ABC")`, would be encoded with "92" in place of "B3 74 6D
|
||||
69 6D 65".
|
||||
form label number 1 were chosen, the second example above,
|
||||
`mime(text/plain "ABC")`, would be encoded with "92" in place of "B3
|
||||
74 6D 69 6D 65".
|
||||
|
||||
### Unicode normalization forms
|
||||
|
||||
|
@ -707,13 +855,13 @@ The definition of `SignedInteger` captures all integers. However, in
|
|||
certain circumstances it can be valuable to assert that a number
|
||||
inhabits a particular range, such as a fixed-width machine word.
|
||||
|
||||
A family of labels `i`*n* and `u`*n* for *n* ∈ {16,32,64} denote
|
||||
A family of labels `i`*n* and `u`*n* for *n* ∈ {8,16,32,64} denote
|
||||
*n*-bit-wide signed and unsigned range restrictions, respectively.
|
||||
Records with these labels *MUST* have one field, a `SignedInteger`,
|
||||
which *MUST* fall within the appropriate range. That is, to be valid,
|
||||
- in `(i16 `*x*`)`, -32768 <= *x* <= 32767.
|
||||
- in `(u16 `*x*`)`, 0 <= *x* <= 65535.
|
||||
- in `(i32 `*x*`)`, -2147483648 <= *x* <= 2147483647.
|
||||
- in `i8(`*x*`)`, -128 <= *x* <= 127.
|
||||
- in `u8(`*x*`)`, 0 <= *x* <= 255.
|
||||
- in `i16(`*x*`)`, -32768 <= *x* <= 32767.
|
||||
- etc.
|
||||
|
||||
### Anonymous Tuples and Unit
|
||||
|
@ -721,15 +869,15 @@ which *MUST* fall within the appropriate range. That is, to be valid,
|
|||
A `Tuple` is a `Record` with label `tuple` and zero or more fields,
|
||||
denoting an anonymous tuple of values.
|
||||
|
||||
The 0-ary tuple, `(tuple)`, denotes the empty tuple, sometimes called
|
||||
The 0-ary tuple, `tuple()`, denotes the empty tuple, sometimes called
|
||||
"unit" or "void" (but *not* e.g. JavaScript's "undefined" value).
|
||||
|
||||
### Null and Undefined
|
||||
|
||||
Tony Hoare's
|
||||
"[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)"
|
||||
can be represented with the 0-ary `Record` `(null)`. An "undefined"
|
||||
value can be represented as `(undefined)`.
|
||||
can be represented with the 0-ary `Record` `null()`. An "undefined"
|
||||
value can be represented as `undefined()`.
|
||||
|
||||
### Dates and Times
|
||||
|
||||
|
@ -741,6 +889,8 @@ or `date-time` productions of
|
|||
|
||||
## Security Considerations
|
||||
|
||||
TODO: Lots of whitespace is just like lots of empty chunks
|
||||
|
||||
**Empty chunks.** Streamed (format C) `String`s, `ByteString`s and
|
||||
`Symbol`s may include chunks of zero length. This opens up a
|
||||
possibility for denial-of-service: an attacker may begin streaming a
|
||||
|
@ -751,9 +901,9 @@ chunks that may appear in a stream, and may even supply an optional
|
|||
mode that rejects empty chunks entirely.
|
||||
|
||||
**Canonical form for cryptographic hashing and signing.** As
|
||||
specified, the encoding rules for `Value`s do not force canonical
|
||||
serializations for `Set` or `Dictionary` values. Two serializations of
|
||||
the same `Value` may yield different binary `Repr`s.
|
||||
specified, neither the textual nor the compact binary encoding rules
|
||||
for `Value`s force canonical serializations. Two serializations of the
|
||||
same `Value` may yield different binary `Repr`s.
|
||||
|
||||
## Appendix. Table of lead byte values
|
||||
|
||||
|
|
Loading…
Reference in New Issue