WIP from the early hours of this morning, adding textual syntax

2018-09-27 11:42:55 +01:00 · 2018-09-27 11:42:55 +01:00 · 732fbc7059
parent bef125e818
commit 732fbc7059
1 changed files with 249 additions and 99 deletions
--- a/preserve.md
+++ b/preserve.md
@ -6,12 +6,13 @@
 # Preserves: an Expressive Data Language

 Tony Garnock-Jones <tonyg@leastfixedpoint.com>  
-September 2018. Version 0.0.2.
+September 2018. Version 0.0.3.

  [sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
  [spki]: http://world.std.com/~cme/html/spki.html
  [varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
  [erlang-map]: http://erlang.org/doc/reference_manual/data_types.html#map
+  [abnf]: https://tools.ietf.org/html/rfc7405

 This document proposes a data model and serialization format called
 *Preserves*.
@ -47,7 +48,8 @@ structures of any particular implementation language.

 Taking inspiration from functional programming, we start with a
 definition of the *values* that we want to work with and give them
-meaning independent of their syntax. We will treat syntax separately,
+meaning independent of their syntax. When we write examples of values,
+we will do so using the [textual syntax](#textual-syntax) defined
 later in this document.

 Our `Value`s fall into two broad categories: *atomic* and *compound*
@ -94,8 +96,7 @@ neither is less than the other according to the total order.
 ### Signed integers.

 A `SignedInteger` is a signed integer of arbitrary width.
-`SignedInteger`s are compared as mathematical integers. We will write
-examples of `SignedInteger`s using standard mathematical notation.
+`SignedInteger`s are compared as mathematical integers.

 **Examples.** 10; -6; 0.

@ -107,8 +108,7 @@ examples of `SignedInteger`s using standard mathematical notation.
 A `String` is a sequence of Unicode
 [code-point](http://www.unicode.org/glossary/#code_point)s. `String`s
 are compared lexicographically, code-point by
-code-point.[^utf8-is-awesome] We will write examples of `String`s as
-text surrounded by quotes “`"`”.
+code-point.[^utf8-is-awesome]

  [^utf8-is-awesome]: Happily, the design of UTF-8 is such that this
    gives the same result as a lexicographic byte-by-byte comparison
@ -121,33 +121,27 @@ the string containing the three Unicode code-points `z` (0x7A), `水`
 ### Binary data.

 A `ByteString` is an ordered sequence of zero or more eight-bit bytes.
-`ByteString`s are compared lexicographically. We will only write
-examples of `ByteString`s that contain bytes denoting printable ASCII
-characters, using “`#"`” as an open-quote and “`"`” as a close-quote
-mark.
+`ByteString`s are compared lexicographically.

-**Examples.** The `ByteString` containing the integers 65, 66 and 67
-(corresponding to ASCII characters `A`, `B` and `C`) is written as
-`#"ABC"`. The empty `ByteString` is written as `#""`. **N.B.** Despite
-appearances, these are *binary* data.
+**Examples.** `#""`, the empty `ByteString`; `#"ABC"`, the
+`ByteString` containing the integers 65, 66 and 67 (corresponding to
+ASCII characters `A`, `B` and `C`). **N.B.** Despite appearances,
+these are *binary* data.

 ### Symbols.

 Programming languages like Lisp and Prolog frequently use string-like
 values called *symbols*. Here, a `Symbol` is, like a `String`, a
 sequence of Unicode code-points representing an identifier of some
-kind. `Symbol`s are also compared lexicographically by code-point. We
-will write examples including only non-empty sequences of
-non-whitespace characters, using a monospace font without quotation
-marks.
+kind. `Symbol`s are also compared lexicographically by code-point.

 **Examples.** `hello-world`; `utf8-string`; `exact-integer?`.

 ### Booleans.

 There are exactly two `Boolean` values, “false” and “true”. The
-“false” value compares less-than the “true” value. We write `#f` for
-“false”, and `#t` for “true”.
+“false” value compares less-than the “true” value. We write `#false`
+for “false”, and `#true` for “true”.

 ### IEEE floating-point values.

@ -159,11 +153,11 @@ every `Double`, and every `SignedInteger` is greater than both. Two
 `Float`s or two `Double`s are to be ordered by the `totalOrder`
 predicate defined in section 5.10 of
 [IEEE Std 754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935).
-We write examples using standard mathematical notation, avoiding NaN
-and infinities, using a suffix `f` or `d` to indicate `Float` or
-`Double`, respectively.
+We write examples using a fractional part and/or an exponent to
+distinguish them from `SignedInteger`s. An additional suffix `f`
+distinguishes `Float`s from `Double`s.

-**Examples.** 10f; -6d; 0f; 0.5d; -1.202e300d.
+**Examples.** 10.0f; -6.0; 0.0f; 0.5; -1.202e300.

 **Non-examples.** 10, -6, and 0, because writing them this way
 indicates `SignedInteger`s, not `Float`s or `Double`s.
@ -174,9 +168,7 @@ A `Record` is a *labelled* tuple of zero or more `Value`s, called the
 record's *fields*. A record's label is itself a `Value`, though it
 will usually be a `Symbol`.[^extensibility] [^iri-labels] `Record`s
 are compared lexicographically as if they were just tuples; that is,
-first by their labels, and then by the remainder of their fields. We
-will write examples of `Record`s as a parenthesised, space-separated
-sequence of their label `Value` followed by their field `Value`s.
+first by their labels, and then by the remainder of their fields.

  [^extensibility]: The [Racket](https://racket-lang.org/) programming
    language defines
@ -194,17 +186,16 @@ sequence of their label `Value` followed by their field `Value`s.
    it cannot be read as an IRI at all, and so the label simply stands
    for itself—for its own `Value`.

-**Examples.** The `Record` with label `foo` and fields 1, 2 and 3 is
-written `(foo 1 2 3)`; the `Record` with label `void` and no fields is
-written `(void)`.
+**Examples.** `foo(1 2 3)`, a `Record` with label `foo` and fields 1,
+2 and 3; `void()`, a `Record` with label `void` and no fields.

-**Non-examples.** `()`, because it lacks a label.
+**Non-examples.** `()`, because it lacks a label; `void`, because it
+lacks even an empty tuple of fields.

 ### Sequences.

 A `Sequence` is a general-purpose, variable-length ordered sequence of
-zero or more `Value`s. `Sequence`s are compared lexicographically. We
-write examples space-separated, surrounded with square brackets.
+zero or more `Value`s. `Sequence`s are compared lexicographically.

 **Examples.** `[]`, the empty sequence; `[1 2 3]`, the sequence of
 `SignedInteger`s 1, 2 and 3.
@ -215,25 +206,24 @@ A `Set` is an unordered finite set of `Value`s. It contains no
 duplicate values, following the [equivalence relation](#equivalence)
 induced by the total order on `Value`s. Two `Set`s are compared by
 sorting their elements ascending using the [total order](#total-order)
-and comparing the resulting `Sequence`s. We write examples
-space-separated, surrounded with curly braces, prefixed by `#set`.
+and comparing the resulting `Sequence`s.

 **Examples.** `#set{}`, the empty set; `#set{#set{}}`, the set
-containing only the empty set; `#set{4 "hello" (void) 9.0f}`, the set
+containing only the empty set; `{4 "hello" (void) 9.0f}`, the set
 containing 4, the string `"hello"`, the record with label `void` and
-no fields, and the `Float` denoting the number 9.0; `#set{1 1.0f}`,
-the set containing a `SignedInteger` and a `Float`; `#set{(mime
-application/xml #"<x/>") (mime application/xml #"<x />")}`, a set
-containing two different type-labelled byte
-arrays.[^mime-xml-difference]
+no fields, and the `Float` denoting the number 9.0; `{1 1.0f}`, the
+set containing a `SignedInteger` and a `Float`; `{mime(application/xml
+#"<x/>") mime(application/xml #"<x />")}`, a set containing two
+different `mime` records.[^mime-xml-difference]

  [^mime-xml-difference]: The two XML documents `<x/>` and `<x />`
    differ by bytewise comparison, and thus yield different record
    values, even though under the semantics of XML they denote
    identical XML infoset.

-**Non-examples.** `#set{1 1 1}`, because it contains multiple
-equivalent `Value`s.
+**Non-examples.** `{1 1}`, because it contains multiple equivalent
+`Value`s; `{}`, because without the `#set` marker, it denotes the
+empty dictionary.

 ### Dictionaries.

@ -241,27 +231,189 @@ A `Dictionary` is an unordered finite collection of pairs of `Value`s.
 Each pair comprises a *key* and a *value*. Keys in a `Dictionary` must
 be pairwise distinct. Instances of `Dictionary` are compared by
 lexicographic comparison of the sequences resulting from ordering each
-`Dictionary`'s pairs in ascending order by key. Examples are written
-as a `#dict`-prefixed, curly-brace-surrounded sequence of
-space-separated key-value pairs, each written with a colon between the
-key and value.
+`Dictionary`'s pairs in ascending order by key.

-**Examples.** `#dict{}`, the empty dictionary; `#dict{a:1}`, the
-dictionary mapping the `Symbol` `a` to the `SignedInteger` 1;
-`#dict{[1 2 3]:a}`, mapping `[1 2 3]` to `a`; `#dict{"hi":0 hi:0
-there:[]}`, having a `String` and two `Symbol` keys, and
-`SignedInteger` and `Sequence` values.
+**Examples.** `{}`, the empty dictionary; `{a: 1}`, the dictionary
+mapping the `Symbol` `a` to the `SignedInteger` 1; `{[1 2 3]: a}`,
+mapping `[1 2 3]` to `a`; `{"hi": 0, hi: 0, there: []}`, having a
+`String` and two `Symbol` keys, and `SignedInteger` and `Sequence`
+values.

-**Non-examples.** `#dict{a:1 b:2 a:3}`, because it contains duplicate
-keys; `#dict{[7 8]:[] [7 8]:99}`, for the same reason.
+**Non-examples.** `{a:1 b:2 a:3}`, because it contains duplicate
+keys; `{[7 8]:[] [7 8]:99}`, for the same reason.

-## Syntax
+## Textual Syntax

 Now we have discussed `Value`s and their meanings, we may turn to
 techniques for *representing* `Value`s for communication or storage.

-For now, we limit our attention to an easily-parsed, easily-produced
-machine-readable syntax.
+In this section, we use [case-sensitive ABNF][abnf] to define a
+textual syntax that is easy for people to read and
+write.[^json-superset] Most of the examples in this document are
+written using this syntax. In the following section, we will define an
+equivalent compact machine-readable syntax.
+
+  [^json-superset]: The grammar of the textual syntax is a superset of
+    JSON, with the slightly unusual feature that `true`, `false`, and
+    `null` are all read as `Symbol`s, and that `SignedInteger`s are
+    never read as `Double`s.
+
+### Character set
+
+[ABNF][abnf] allows easy definition of US-ASCII-based languages.
+However, Preserves is a Unicode-based language. Therefore, we
+reinterpret ABNF as a grammar for recognising sequences of Unicode
+code points.
+
+Textual syntax for a `Value` *SHOULD* be encoded using UTF-8 where
+possible.
+
+### Whitespace
+
+Whitespace is defined as any number of spaces, tabs, carriage returns,
+line feeds, comments, or commas. A comment is a semicolon followed by
+the unicode code points up to and including the next carriage return
+or line feed.
+
+                ws = *(%x20 / %x09 / newline / comment / ",")
+           newline = CR / LF
+           comment = ";" *(WSP / nonnl) newline
+             nonnl = <any Unicode code point except CR or LF>
+
+### Grammar
+
+Standalone documents containing textual representations of `Value`s may have trailing whitespace.
+
+          Document = Value ws
+
+Any `Value` may be preceded by whitespace.
+
+             Value = ws (Record / Collection / Atom / Compact)
+        Collection = Sequence / Dictionary / Set
+              Atom = Boolean / Float / Double / SignedInteger /
+                     String / ByteString / Symbol
+
+Each `Record` is its label-`Value` followed by a parenthesised
+grouping of its field-`Value`s.
+
+            Record = Value ws "(" *Value ws ")"
+
+`Sequence`s are enclosed in square brackets. `Dictionary` values are
+curly-brace-enclosed colon-separated pairs of values. `Set`s are
+written either as a simple curly-brace-enclosed non-empty sequence of
+values, or as a possibly-empty sequence of values enclosed by the
+tokens `#set{` and `}`.
+
+          Sequence = "[" *Value ws "]"
+        Dictionary = "{" *(Value ws ":" Value) ws "}"
+               Set = %s"#set{" *Value ws "}" / "{" 1*Value ws "}"
+
+Any `Value` may be represented using the
+[compact binary syntax](#compact-binary-syntax) by directly prefixing
+the binary form of the `Value` with ASCII `SOH` (`%x01`), or by
+enclosing a hexadecimal representation of the binary form of the
+`Value` in the tokens `#hexvalue{` and `}`.
+
+           Compact = %x01 <binary data> / %s"#hexvalue{" *(ws / HEXDIG) ws "}"
+
+`Boolean`s are the simple literal strings `#true` and `#false`.
+
+           Boolean = %s"#true" / %s"#false"
+
+Numeric data follow the
+[JSON grammar](https://tools.ietf.org/html/rfc8259#section-6), with
+the addition of a trailing "f" distinguishing `Float` from `Double`
+values. `Float`s and `Double`s always have either a fractional part or
+an exponent part, where `SignedInteger`s never have either.
+
+TODO: talk about precise reading of floats, and the need for arbitrary
+precision. Your language will often have a good floating-point reading
+library.
+
+             Float = flt %i"f"
+            Double = flt
+     SignedInteger = int
+
+          digit1-9 = %x31-39
+               nat = %x30 / ( digit1-9 *DIGIT )
+               int = ["-"] nat
+              frac = "." 1*DIGIT
+               exp = %i"e" ["-"/"+"] 1*DIGIT
+               flt = int (frac exp / frac / exp)
+
+`String`s are,
+[as in JSON](https://tools.ietf.org/html/rfc8259#section-7), possibly
+escaped text surrounded by double quotes. The escaping rules are the
+same as for JSON.[^string-json-correspondence]
+
+TODO: discuss surrogate pairs in \uXXXX form
+
+            String = %x22 *char %x22
+              char = unescaped / %x7C / escape (escaped / %x22 / %s"u" 4HEXDIG)
+         unescaped = %x20-21 / %x23-5B / %x5D-7B / %x7D-10FFFF
+            escape = %x5C              ; \
+           escaped = ( %x5C /          ; \    reverse solidus U+005C
+                       %x2F /          ; /    solidus         U+002F
+                       %x62 /          ; b    backspace       U+0008
+                       %x66 /          ; f    form feed       U+000C
+                       %x6E /          ; n    line feed       U+000A
+                       %x72 /          ; r    carriage return U+000D
+                       %x74 )          ; t    tab             U+0009
+
+  [^string-json-correspondence]: The grammar for `String` has the same
+    effect as the
+    [JSON](https://tools.ietf.org/html/rfc8259#section-7) grammar for
+    `string`. Some auxiliary definitions (e.g. `escaped`) are lifted
+    largely unmodified from the text of RFC 8259.
+
+A `ByteString` may be written in any of three different forms.
+
+The first is similar to a `String`, but prepended with a hash sign
+`#`. In addition, only Unicode code points overlapping with printable
+7-bit ASCII are permitted unescaped inside such a `ByteString`; other
+byte values must be escaped by prepending a two-digit hexadecimal
+value with `\x`.
+
+        ByteString = "#" %x22 *binchar %x22
+           binchar = binunescaped / escape (escaped / %x22 / %s"x" 2HEXDIG)
+      binunescaped = %x20-21 / %x23-5B / %x5D-7E
+
+The second is as a sequence of pairs hexadecimal digits interleaved
+with whitespace and surrounded by `#hex{` and `}`.
+
+       ByteString =/ %s"#hex{" *(ws / 2HEXDIG) ws "}"
+
+The third is as a sequence of
+[Base64](https://tools.ietf.org/html/rfc4648) characters, interleaved
+with whitespace and surrounded by `#base64{` and `}`. Plain and
+URL-safe Base64 characters are allowed.
+
+       ByteString =/ %s"#base64{" *(ws / base64char) ws "}" /
+        base64char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "-" / "_" / "="
+
+A `Symbol` may be written in a "bare" form,[^cf-sexp-token] so long as
+it conforms to certain restrictions on the characters appearing in the
+symbol, or in a quoted form. The quoted form is much the same as the
+syntax for `String`s, including embedded escape syntax, except using a
+bar or pipe character (`|`) instead of a double quote mark.
+
+            Symbol = symstart *symcont / "|" *symchar "|"
+          symstart = ALPHA / sympunct
+           symcont = ALPHA / sympunct / DIGIT / "-" / "."
+          sympunct = "~" / "!" / "@" / "$" / "%" / "^" / "&" / "*" /
+                     "?" / "_" / "=" / "+" / "<" / ">" / "/"
+           symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
+
+  [^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt]
+    definition of "token representation".
+
+TODO: More unicode in unescaped symbols?
+
+### Printing
+
+Recommend a JSON-compatible print mode. Recommend a submode with trailing commas.
+
+## Compact Binary Syntax

 A `Repr` is an encoding, or representation, of a specific `Value`.
 Each `Repr` comprises one or more bytes describing first the kind of
@ -373,14 +525,14 @@ be a single `Repr`.

 Format B (known length):

-    [[ (L F_1...F_m) ]] = header(2,3,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]]
+    [[ L(F_1...F_m) ]] = header(2,3,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]]

 For `m` fields, `m+1` is supplied to `header`, to account for the
 encoding of the record label.

 Format C (streaming):

-    [[ (L F_1...F_m) ]] = open(2,3) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close(2,3)
+    [[ L(F_1...F_m) ]] = open(2,3) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close(2,3)

 Applications *SHOULD* prefer the known-length format for encoding
 `Record`s.
@ -401,12 +553,12 @@ and format C becomes
 **Examples.** For example, a protocol may choose to map records
 labelled `void` to `n=0`, making

-    [[(void)]] = header(2,0,0) = [0x80]
+    [[void()]] = header(2,0,0) = [0x80]

 or it may map records labelled `person` to short form label number 1,
 making

-    [[(person "Dr" "Elizabeth" "Blackwell")]]
+    [[person("Dr", "Elizabeth", "Blackwell")]]
        = header(2,1,3) ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
        =        [0x93] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]

@ -421,20 +573,20 @@ for format C.

 Format B (known length):

-                 [[ [X_1...X_m] ]] = header(3,0,m)   ++ [[X_1]] ++...++ [[X_m]]
-             [[ #set{X_1...X_m} ]] = header(3,1,m)   ++ [[X_1]] ++...++ [[X_m]]
-    [[ #dict{K_1:V_1...K_m:V_m} ]] = header(3,2,m*2) ++ [[K_1]] ++ [[V_1]] ++...
-                                                     ++ [[K_m]] ++ [[V_m]]
+            [[ [X_1...X_m] ]] = header(3,0,m)   ++ [[X_1]] ++...++ [[X_m]]
+        [[ #set{X_1...X_m} ]] = header(3,1,m)   ++ [[X_1]] ++...++ [[X_m]]
+    [[ {K_1:V_1...K_m:V_m} ]] = header(3,2,m*2) ++ [[K_1]] ++ [[V_1]] ++...
+                                                ++ [[K_m]] ++ [[V_m]]

 Note that `m*2` is given to `header` for a `Dictionary`, since there
 are two `Value`s in each key-value pair.

 Format C (streaming):

-                 [[ [X_1...X_m] ]] = open(3,0) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,0)
-             [[ #set{X_1...X_m} ]] = open(3,1) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,1)
-    [[ #dict{K_1:V_1...K_m:V_m} ]] = open(3,2) ++ [[K_1]] ++ [[V_1]] ++...
-                                               ++ [[K_m]] ++ [[V_m]] ++ close(3,2)
+            [[ [X_1...X_m] ]] = open(3,0) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,0)
+        [[ #set{X_1...X_m} ]] = open(3,1) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,1)
+    [[ {K_1:V_1...K_m:V_m} ]] = open(3,2) ++ [[K_1]] ++ [[V_1]] ++...
+                                          ++ [[K_m]] ++ [[V_m]] ++ close(3,2)

 Applications may use whichever format suits their needs on a
 case-by-case basis.
@ -528,8 +680,8 @@ specify lengths. Applications *MUST NOT* use format C with

 #### Booleans

-    [[ #f ]] = header(0,0,0) = [0x00]
-    [[ #t ]] = header(0,0,1) = [0x01]
+    [[ #false ]] = header(0,0,0) = [0x00]
+    [[  #true ]] = header(0,0,1) = [0x01]

 #### Floats and Doubles

@ -550,31 +702,27 @@ short form label number 0 to label `discard`, 1 to `capture`, and 2 to

 | Value                                             | Encoded hexadecimal byte sequence                                    |
 |---------------------------------------------------|----------------------------------------------------------------------|
-| `(capture (discard))`                             | 91 80                                                                |
-| `(observe (speak (discard) (capture (discard))))` | A1 B3 75 73 70 65 61 6B 80 91 80                                     |
+| `capture(discard())`                              | 91 80                                                                |
+| `observe(speak(discard(), capture(discard())))`   | A1 B3 75 73 70 65 61 6B 80 91 80                                     |
 | `[1 2 3 4]` (format B)                            | C4 11 12 13 14                                                       |
 | `[1 2 3 4]` (format C)                            | 2C 11 12 13 14 3C                                                    |
 | `[-2 -1 0 1]`                                     | C4 1E 1F 10 11                                                       |
 | `"hello"` (format B)                              | 55 68 65 6C 6C 6F                                                    |
 | `"hello"` (format C, 2 chunks)                    | 25 62 68 65 63 6C 6C 6F 35                                           |
 | `"hello"` (format C, 5 chunks)                    | 25 62 68 65 62 6C 6C 60 60 61 6F 35                                  |
-| `["hello" there #"world" [] #set{} #t #f]`        | C7 55 68 65 6C 6C 6F 75 74 68 65 72 65 65 77 6F 72 6C 64 C0 D0 01 00 |
+| `["hello" there #"world" [] #set{} #true #false]` | C7 55 68 65 6C 6C 6F 75 74 68 65 72 65 65 77 6F 72 6C 64 C0 D0 01 00 |
 | `-257`                                            | 42 FE FF                                                             |
 | `-1`                                              | 1F                                                                   |
 | `0`                                               | 10                                                                   |
 | `1`                                               | 11                                                                   |
 | `255`                                             | 42 00 FF                                                             |
-| `1f`                                              | 02 3F 80 00 00                                                       |
-| `1d`                                              | 03 3F F0 00 00 00 00 00 00                                           |
-| `-1.202e300d`                                     | 03 FE 3C B7 B7 59 BF 04 26                                           |
+| `1.0f`                                            | 02 3F 80 00 00                                                       |
+| `1.0`                                             | 03 3F F0 00 00 00 00 00 00                                           |
+| `-1.202e300`                                      | 03 FE 3C B7 B7 59 BF 04 26                                           |

 Finally, a larger example, using a non-`Symbol` label for a record.[^extensibility2] The `Record`

-    ([titled person 2 thing 1]
-       101
-       "Blackwell"
-       (date 1821 2 3)
-       "Dr")
+    [titled person 2 thing 1](101, "Blackwell", date(1821 2 3), "Dr")

 encodes to

@ -671,16 +819,16 @@ such media types following the general rules for ordering of

 | Value                                      | Encoded hexadecimal byte sequence                                                                                 |
 |--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
-| `(mime application/octet-stream #"abcde")` | B3 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
-| `(mime text/plain #"ABC")`                 | B3 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43                                                    |
-| `(mime application/xml #"<xhtml/>")`       | B3 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E                   |
-| `(mime text/csv #"123,234,345")`           | B3 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35                                  |
+| `mime(application/octet-stream #"abcde")`  | B3 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
+| `mime(text/plain #"ABC")`                  | B3 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43                                                    |
+| `mime(application/xml #"<xhtml/>")`        | B3 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E                   |
+| `mime(text/csv #"123,234,345")`            | B3 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35                                  |

 Applications making heavy use of `mime` records may choose to use a
 short form label number for the record type. For example, if short
-form label number 1 were chosen, the second example above, `(mime
-text/plain "ABC")`, would be encoded with "92" in place of "B3 74 6D
-69 6D 65".
+form label number 1 were chosen, the second example above,
+`mime(text/plain "ABC")`, would be encoded with "92" in place of "B3
+74 6D 69 6D 65".

 ### Unicode normalization forms

@ -707,13 +855,13 @@ The definition of `SignedInteger` captures all integers. However, in
 certain circumstances it can be valuable to assert that a number
 inhabits a particular range, such as a fixed-width machine word.

-A family of labels `i`*n* and `u`*n* for *n* ∈ {16,32,64} denote
+A family of labels `i`*n* and `u`*n* for *n* ∈ {8,16,32,64} denote
 *n*-bit-wide signed and unsigned range restrictions, respectively.
 Records with these labels *MUST* have one field, a `SignedInteger`,
 which *MUST* fall within the appropriate range. That is, to be valid,
- - in `(i16 `*x*`)`, -32768 <= *x* <= 32767.
- - in `(u16 `*x*`)`, 0 <= *x* <= 65535.
- - in `(i32 `*x*`)`, -2147483648 <= *x* <= 2147483647.
+ - in `i8(`*x*`)`, -128 <= *x* <= 127.
+ - in `u8(`*x*`)`, 0 <= *x* <= 255.
+ - in `i16(`*x*`)`, -32768 <= *x* <= 32767.
 - etc.

 ### Anonymous Tuples and Unit
@ -721,15 +869,15 @@ which *MUST* fall within the appropriate range. That is, to be valid,
 A `Tuple` is a `Record` with label `tuple` and zero or more fields,
 denoting an anonymous tuple of values.

-The 0-ary tuple, `(tuple)`, denotes the empty tuple, sometimes called
+The 0-ary tuple, `tuple()`, denotes the empty tuple, sometimes called
 "unit" or "void" (but *not* e.g. JavaScript's "undefined" value).

 ### Null and Undefined

 Tony Hoare's
 "[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)"
-can be represented with the 0-ary `Record` `(null)`. An "undefined"
-value can be represented as `(undefined)`.
+can be represented with the 0-ary `Record` `null()`. An "undefined"
+value can be represented as `undefined()`.

 ### Dates and Times

@ -741,6 +889,8 @@ or `date-time` productions of

 ## Security Considerations

+TODO: Lots of whitespace is just like lots of empty chunks
+
 **Empty chunks.** Streamed (format C) `String`s, `ByteString`s and
 `Symbol`s may include chunks of zero length. This opens up a
 possibility for denial-of-service: an attacker may begin streaming a
@ -751,9 +901,9 @@ chunks that may appear in a stream, and may even supply an optional
 mode that rejects empty chunks entirely.

 **Canonical form for cryptographic hashing and signing.** As
-specified, the encoding rules for `Value`s do not force canonical
-serializations for `Set` or `Dictionary` values. Two serializations of
-the same `Value` may yield different binary `Repr`s.
+specified, neither the textual nor the compact binary encoding rules
+for `Value`s force canonical serializations. Two serializations of the
+same `Value` may yield different binary `Repr`s.

 ## Appendix. Table of lead byte values