diff --git a/syndicate/mc/preserve.md b/syndicate/mc/preserve.md index 7ef79c1..6bb6d39 100644 --- a/syndicate/mc/preserve.md +++ b/syndicate/mc/preserve.md @@ -6,12 +6,13 @@ # Preserves: an Expressive Data Language Tony Garnock-Jones -September 2018. Version 0.0.2. +September 2018. Version 0.0.3. [sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt [spki]: http://world.std.com/~cme/html/spki.html [varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints [erlang-map]: http://erlang.org/doc/reference_manual/data_types.html#map + [abnf]: https://tools.ietf.org/html/rfc7405 This document proposes a data model and serialization format called *Preserves*. @@ -47,7 +48,8 @@ structures of any particular implementation language. Taking inspiration from functional programming, we start with a definition of the *values* that we want to work with and give them -meaning independent of their syntax. We will treat syntax separately, +meaning independent of their syntax. When we write examples of values, +we will do so using the [textual syntax](#textual-syntax) defined later in this document. Our `Value`s fall into two broad categories: *atomic* and *compound* @@ -94,8 +96,7 @@ neither is less than the other according to the total order. ### Signed integers. A `SignedInteger` is a signed integer of arbitrary width. -`SignedInteger`s are compared as mathematical integers. We will write -examples of `SignedInteger`s using standard mathematical notation. +`SignedInteger`s are compared as mathematical integers. **Examples.** 10; -6; 0. @@ -107,8 +108,7 @@ examples of `SignedInteger`s using standard mathematical notation. A `String` is a sequence of Unicode [code-point](http://www.unicode.org/glossary/#code_point)s. `String`s are compared lexicographically, code-point by -code-point.[^utf8-is-awesome] We will write examples of `String`s as -text surrounded by quotes “`"`”. +code-point.[^utf8-is-awesome] [^utf8-is-awesome]: Happily, the design of UTF-8 is such that this gives the same result as a lexicographic byte-by-byte comparison @@ -121,33 +121,27 @@ the string containing the three Unicode code-points `z` (0x7A), `水` ### Binary data. A `ByteString` is an ordered sequence of zero or more eight-bit bytes. -`ByteString`s are compared lexicographically. We will only write -examples of `ByteString`s that contain bytes denoting printable ASCII -characters, using “`#"`” as an open-quote and “`"`” as a close-quote -mark. +`ByteString`s are compared lexicographically. -**Examples.** The `ByteString` containing the integers 65, 66 and 67 -(corresponding to ASCII characters `A`, `B` and `C`) is written as -`#"ABC"`. The empty `ByteString` is written as `#""`. **N.B.** Despite -appearances, these are *binary* data. +**Examples.** `#""`, the empty `ByteString`; `#"ABC"`, the +`ByteString` containing the integers 65, 66 and 67 (corresponding to +ASCII characters `A`, `B` and `C`). **N.B.** Despite appearances, +these are *binary* data. ### Symbols. Programming languages like Lisp and Prolog frequently use string-like values called *symbols*. Here, a `Symbol` is, like a `String`, a sequence of Unicode code-points representing an identifier of some -kind. `Symbol`s are also compared lexicographically by code-point. We -will write examples including only non-empty sequences of -non-whitespace characters, using a monospace font without quotation -marks. +kind. `Symbol`s are also compared lexicographically by code-point. **Examples.** `hello-world`; `utf8-string`; `exact-integer?`. ### Booleans. There are exactly two `Boolean` values, “false” and “true”. The -“false” value compares less-than the “true” value. We write `#f` for -“false”, and `#t` for “true”. +“false” value compares less-than the “true” value. We write `#false` +for “false”, and `#true` for “true”. ### IEEE floating-point values. @@ -159,11 +153,11 @@ every `Double`, and every `SignedInteger` is greater than both. Two `Float`s or two `Double`s are to be ordered by the `totalOrder` predicate defined in section 5.10 of [IEEE Std 754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935). -We write examples using standard mathematical notation, avoiding NaN -and infinities, using a suffix `f` or `d` to indicate `Float` or -`Double`, respectively. +We write examples using a fractional part and/or an exponent to +distinguish them from `SignedInteger`s. An additional suffix `f` +distinguishes `Float`s from `Double`s. -**Examples.** 10f; -6d; 0f; 0.5d; -1.202e300d. +**Examples.** 10.0f; -6.0; 0.0f; 0.5; -1.202e300. **Non-examples.** 10, -6, and 0, because writing them this way indicates `SignedInteger`s, not `Float`s or `Double`s. @@ -174,9 +168,7 @@ A `Record` is a *labelled* tuple of zero or more `Value`s, called the record's *fields*. A record's label is itself a `Value`, though it will usually be a `Symbol`.[^extensibility] [^iri-labels] `Record`s are compared lexicographically as if they were just tuples; that is, -first by their labels, and then by the remainder of their fields. We -will write examples of `Record`s as a parenthesised, space-separated -sequence of their label `Value` followed by their field `Value`s. +first by their labels, and then by the remainder of their fields. [^extensibility]: The [Racket](https://racket-lang.org/) programming language defines @@ -194,17 +186,16 @@ sequence of their label `Value` followed by their field `Value`s. it cannot be read as an IRI at all, and so the label simply stands for itself—for its own `Value`. -**Examples.** The `Record` with label `foo` and fields 1, 2 and 3 is -written `(foo 1 2 3)`; the `Record` with label `void` and no fields is -written `(void)`. +**Examples.** `foo(1 2 3)`, a `Record` with label `foo` and fields 1, +2 and 3; `void()`, a `Record` with label `void` and no fields. -**Non-examples.** `()`, because it lacks a label. +**Non-examples.** `()`, because it lacks a label; `void`, because it +lacks even an empty tuple of fields. ### Sequences. A `Sequence` is a general-purpose, variable-length ordered sequence of -zero or more `Value`s. `Sequence`s are compared lexicographically. We -write examples space-separated, surrounded with square brackets. +zero or more `Value`s. `Sequence`s are compared lexicographically. **Examples.** `[]`, the empty sequence; `[1 2 3]`, the sequence of `SignedInteger`s 1, 2 and 3. @@ -215,25 +206,24 @@ A `Set` is an unordered finite set of `Value`s. It contains no duplicate values, following the [equivalence relation](#equivalence) induced by the total order on `Value`s. Two `Set`s are compared by sorting their elements ascending using the [total order](#total-order) -and comparing the resulting `Sequence`s. We write examples -space-separated, surrounded with curly braces, prefixed by `#set`. +and comparing the resulting `Sequence`s. **Examples.** `#set{}`, the empty set; `#set{#set{}}`, the set -containing only the empty set; `#set{4 "hello" (void) 9.0f}`, the set +containing only the empty set; `{4 "hello" (void) 9.0f}`, the set containing 4, the string `"hello"`, the record with label `void` and -no fields, and the `Float` denoting the number 9.0; `#set{1 1.0f}`, -the set containing a `SignedInteger` and a `Float`; `#set{(mime -application/xml #"") (mime application/xml #"")}`, a set -containing two different type-labelled byte -arrays.[^mime-xml-difference] +no fields, and the `Float` denoting the number 9.0; `{1 1.0f}`, the +set containing a `SignedInteger` and a `Float`; `{mime(application/xml +#"") mime(application/xml #"")}`, a set containing two +different `mime` records.[^mime-xml-difference] [^mime-xml-difference]: The two XML documents `` and `` differ by bytewise comparison, and thus yield different record values, even though under the semantics of XML they denote identical XML infoset. -**Non-examples.** `#set{1 1 1}`, because it contains multiple -equivalent `Value`s. +**Non-examples.** `{1 1}`, because it contains multiple equivalent +`Value`s; `{}`, because without the `#set` marker, it denotes the +empty dictionary. ### Dictionaries. @@ -241,27 +231,189 @@ A `Dictionary` is an unordered finite collection of pairs of `Value`s. Each pair comprises a *key* and a *value*. Keys in a `Dictionary` must be pairwise distinct. Instances of `Dictionary` are compared by lexicographic comparison of the sequences resulting from ordering each -`Dictionary`'s pairs in ascending order by key. Examples are written -as a `#dict`-prefixed, curly-brace-surrounded sequence of -space-separated key-value pairs, each written with a colon between the -key and value. +`Dictionary`'s pairs in ascending order by key. -**Examples.** `#dict{}`, the empty dictionary; `#dict{a:1}`, the -dictionary mapping the `Symbol` `a` to the `SignedInteger` 1; -`#dict{[1 2 3]:a}`, mapping `[1 2 3]` to `a`; `#dict{"hi":0 hi:0 -there:[]}`, having a `String` and two `Symbol` keys, and -`SignedInteger` and `Sequence` values. +**Examples.** `{}`, the empty dictionary; `{a: 1}`, the dictionary +mapping the `Symbol` `a` to the `SignedInteger` 1; `{[1 2 3]: a}`, +mapping `[1 2 3]` to `a`; `{"hi": 0, hi: 0, there: []}`, having a +`String` and two `Symbol` keys, and `SignedInteger` and `Sequence` +values. -**Non-examples.** `#dict{a:1 b:2 a:3}`, because it contains duplicate -keys; `#dict{[7 8]:[] [7 8]:99}`, for the same reason. +**Non-examples.** `{a:1 b:2 a:3}`, because it contains duplicate +keys; `{[7 8]:[] [7 8]:99}`, for the same reason. -## Syntax +## Textual Syntax Now we have discussed `Value`s and their meanings, we may turn to techniques for *representing* `Value`s for communication or storage. -For now, we limit our attention to an easily-parsed, easily-produced -machine-readable syntax. +In this section, we use [case-sensitive ABNF][abnf] to define a +textual syntax that is easy for people to read and +write.[^json-superset] Most of the examples in this document are +written using this syntax. In the following section, we will define an +equivalent compact machine-readable syntax. + + [^json-superset]: The grammar of the textual syntax is a superset of + JSON, with the slightly unusual feature that `true`, `false`, and + `null` are all read as `Symbol`s, and that `SignedInteger`s are + never read as `Double`s. + +### Character set + +[ABNF][abnf] allows easy definition of US-ASCII-based languages. +However, Preserves is a Unicode-based language. Therefore, we +reinterpret ABNF as a grammar for recognising sequences of Unicode +code points. + +Textual syntax for a `Value` *SHOULD* be encoded using UTF-8 where +possible. + +### Whitespace + +Whitespace is defined as any number of spaces, tabs, carriage returns, +line feeds, comments, or commas. A comment is a semicolon followed by +the unicode code points up to and including the next carriage return +or line feed. + + ws = *(%x20 / %x09 / newline / comment / ",") + newline = CR / LF + comment = ";" *(WSP / nonnl) newline + nonnl = + +### Grammar + +Standalone documents containing textual representations of `Value`s may have trailing whitespace. + + Document = Value ws + +Any `Value` may be preceded by whitespace. + + Value = ws (Record / Collection / Atom / Compact) + Collection = Sequence / Dictionary / Set + Atom = Boolean / Float / Double / SignedInteger / + String / ByteString / Symbol + +Each `Record` is its label-`Value` followed by a parenthesised +grouping of its field-`Value`s. + + Record = Value ws "(" *Value ws ")" + +`Sequence`s are enclosed in square brackets. `Dictionary` values are +curly-brace-enclosed colon-separated pairs of values. `Set`s are +written either as a simple curly-brace-enclosed non-empty sequence of +values, or as a possibly-empty sequence of values enclosed by the +tokens `#set{` and `}`. + + Sequence = "[" *Value ws "]" + Dictionary = "{" *(Value ws ":" Value) ws "}" + Set = %s"#set{" *Value ws "}" / "{" 1*Value ws "}" + +Any `Value` may be represented using the +[compact binary syntax](#compact-binary-syntax) by directly prefixing +the binary form of the `Value` with ASCII `SOH` (`%x01`), or by +enclosing a hexadecimal representation of the binary form of the +`Value` in the tokens `#hexvalue{` and `}`. + + Compact = %x01 / %s"#hexvalue{" *(ws / HEXDIG) ws "}" + +`Boolean`s are the simple literal strings `#true` and `#false`. + + Boolean = %s"#true" / %s"#false" + +Numeric data follow the +[JSON grammar](https://tools.ietf.org/html/rfc8259#section-6), with +the addition of a trailing "f" distinguishing `Float` from `Double` +values. `Float`s and `Double`s always have either a fractional part or +an exponent part, where `SignedInteger`s never have either. + +TODO: talk about precise reading of floats, and the need for arbitrary +precision. Your language will often have a good floating-point reading +library. + + Float = flt %i"f" + Double = flt + SignedInteger = int + + digit1-9 = %x31-39 + nat = %x30 / ( digit1-9 *DIGIT ) + int = ["-"] nat + frac = "." 1*DIGIT + exp = %i"e" ["-"/"+"] 1*DIGIT + flt = int (frac exp / frac / exp) + +`String`s are, +[as in JSON](https://tools.ietf.org/html/rfc8259#section-7), possibly +escaped text surrounded by double quotes. The escaping rules are the +same as for JSON.[^string-json-correspondence] + +TODO: discuss surrogate pairs in \uXXXX form + + String = %x22 *char %x22 + char = unescaped / %x7C / escape (escaped / %x22 / %s"u" 4HEXDIG) + unescaped = %x20-21 / %x23-5B / %x5D-7B / %x7D-10FFFF + escape = %x5C ; \ + escaped = ( %x5C / ; \ reverse solidus U+005C + %x2F / ; / solidus U+002F + %x62 / ; b backspace U+0008 + %x66 / ; f form feed U+000C + %x6E / ; n line feed U+000A + %x72 / ; r carriage return U+000D + %x74 ) ; t tab U+0009 + + [^string-json-correspondence]: The grammar for `String` has the same + effect as the + [JSON](https://tools.ietf.org/html/rfc8259#section-7) grammar for + `string`. Some auxiliary definitions (e.g. `escaped`) are lifted + largely unmodified from the text of RFC 8259. + +A `ByteString` may be written in any of three different forms. + +The first is similar to a `String`, but prepended with a hash sign +`#`. In addition, only Unicode code points overlapping with printable +7-bit ASCII are permitted unescaped inside such a `ByteString`; other +byte values must be escaped by prepending a two-digit hexadecimal +value with `\x`. + + ByteString = "#" %x22 *binchar %x22 + binchar = binunescaped / escape (escaped / %x22 / %s"x" 2HEXDIG) + binunescaped = %x20-21 / %x23-5B / %x5D-7E + +The second is as a sequence of pairs hexadecimal digits interleaved +with whitespace and surrounded by `#hex{` and `}`. + + ByteString =/ %s"#hex{" *(ws / 2HEXDIG) ws "}" + +The third is as a sequence of +[Base64](https://tools.ietf.org/html/rfc4648) characters, interleaved +with whitespace and surrounded by `#base64{` and `}`. Plain and +URL-safe Base64 characters are allowed. + + ByteString =/ %s"#base64{" *(ws / base64char) ws "}" / + base64char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "-" / "_" / "=" + +A `Symbol` may be written in a "bare" form,[^cf-sexp-token] so long as +it conforms to certain restrictions on the characters appearing in the +symbol, or in a quoted form. The quoted form is much the same as the +syntax for `String`s, including embedded escape syntax, except using a +bar or pipe character (`|`) instead of a double quote mark. + + Symbol = symstart *symcont / "|" *symchar "|" + symstart = ALPHA / sympunct + symcont = ALPHA / sympunct / DIGIT / "-" / "." + sympunct = "~" / "!" / "@" / "$" / "%" / "^" / "&" / "*" / + "?" / "_" / "=" / "+" / "<" / ">" / "/" + symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG) + + [^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt] + definition of "token representation". + +TODO: More unicode in unescaped symbols? + +### Printing + +Recommend a JSON-compatible print mode. Recommend a submode with trailing commas. + +## Compact Binary Syntax A `Repr` is an encoding, or representation, of a specific `Value`. Each `Repr` comprises one or more bytes describing first the kind of @@ -373,14 +525,14 @@ be a single `Repr`. Format B (known length): - [[ (L F_1...F_m) ]] = header(2,3,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] + [[ L(F_1...F_m) ]] = header(2,3,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] For `m` fields, `m+1` is supplied to `header`, to account for the encoding of the record label. Format C (streaming): - [[ (L F_1...F_m) ]] = open(2,3) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close(2,3) + [[ L(F_1...F_m) ]] = open(2,3) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close(2,3) Applications *SHOULD* prefer the known-length format for encoding `Record`s. @@ -401,12 +553,12 @@ and format C becomes **Examples.** For example, a protocol may choose to map records labelled `void` to `n=0`, making - [[(void)]] = header(2,0,0) = [0x80] + [[void()]] = header(2,0,0) = [0x80] or it may map records labelled `person` to short form label number 1, making - [[(person "Dr" "Elizabeth" "Blackwell")]] + [[person("Dr", "Elizabeth", "Blackwell")]] = header(2,1,3) ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]] = [0x93] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]] @@ -421,20 +573,20 @@ for format C. Format B (known length): - [[ [X_1...X_m] ]] = header(3,0,m) ++ [[X_1]] ++...++ [[X_m]] - [[ #set{X_1...X_m} ]] = header(3,1,m) ++ [[X_1]] ++...++ [[X_m]] - [[ #dict{K_1:V_1...K_m:V_m} ]] = header(3,2,m*2) ++ [[K_1]] ++ [[V_1]] ++... - ++ [[K_m]] ++ [[V_m]] + [[ [X_1...X_m] ]] = header(3,0,m) ++ [[X_1]] ++...++ [[X_m]] + [[ #set{X_1...X_m} ]] = header(3,1,m) ++ [[X_1]] ++...++ [[X_m]] + [[ {K_1:V_1...K_m:V_m} ]] = header(3,2,m*2) ++ [[K_1]] ++ [[V_1]] ++... + ++ [[K_m]] ++ [[V_m]] Note that `m*2` is given to `header` for a `Dictionary`, since there are two `Value`s in each key-value pair. Format C (streaming): - [[ [X_1...X_m] ]] = open(3,0) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,0) - [[ #set{X_1...X_m} ]] = open(3,1) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,1) - [[ #dict{K_1:V_1...K_m:V_m} ]] = open(3,2) ++ [[K_1]] ++ [[V_1]] ++... - ++ [[K_m]] ++ [[V_m]] ++ close(3,2) + [[ [X_1...X_m] ]] = open(3,0) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,0) + [[ #set{X_1...X_m} ]] = open(3,1) ++ [[X_1]] ++...++ [[X_m]] ++ close(3,1) + [[ {K_1:V_1...K_m:V_m} ]] = open(3,2) ++ [[K_1]] ++ [[V_1]] ++... + ++ [[K_m]] ++ [[V_m]] ++ close(3,2) Applications may use whichever format suits their needs on a case-by-case basis. @@ -528,8 +680,8 @@ specify lengths. Applications *MUST NOT* use format C with #### Booleans - [[ #f ]] = header(0,0,0) = [0x00] - [[ #t ]] = header(0,0,1) = [0x01] + [[ #false ]] = header(0,0,0) = [0x00] + [[ #true ]] = header(0,0,1) = [0x01] #### Floats and Doubles @@ -550,31 +702,27 @@ short form label number 0 to label `discard`, 1 to `capture`, and 2 to | Value | Encoded hexadecimal byte sequence | |---------------------------------------------------|----------------------------------------------------------------------| -| `(capture (discard))` | 91 80 | -| `(observe (speak (discard) (capture (discard))))` | A1 B3 75 73 70 65 61 6B 80 91 80 | +| `capture(discard())` | 91 80 | +| `observe(speak(discard(), capture(discard())))` | A1 B3 75 73 70 65 61 6B 80 91 80 | | `[1 2 3 4]` (format B) | C4 11 12 13 14 | | `[1 2 3 4]` (format C) | 2C 11 12 13 14 3C | | `[-2 -1 0 1]` | C4 1E 1F 10 11 | | `"hello"` (format B) | 55 68 65 6C 6C 6F | | `"hello"` (format C, 2 chunks) | 25 62 68 65 63 6C 6C 6F 35 | | `"hello"` (format C, 5 chunks) | 25 62 68 65 62 6C 6C 60 60 61 6F 35 | -| `["hello" there #"world" [] #set{} #t #f]` | C7 55 68 65 6C 6C 6F 75 74 68 65 72 65 65 77 6F 72 6C 64 C0 D0 01 00 | +| `["hello" there #"world" [] #set{} #true #false]` | C7 55 68 65 6C 6C 6F 75 74 68 65 72 65 65 77 6F 72 6C 64 C0 D0 01 00 | | `-257` | 42 FE FF | | `-1` | 1F | | `0` | 10 | | `1` | 11 | | `255` | 42 00 FF | -| `1f` | 02 3F 80 00 00 | -| `1d` | 03 3F F0 00 00 00 00 00 00 | -| `-1.202e300d` | 03 FE 3C B7 B7 59 BF 04 26 | +| `1.0f` | 02 3F 80 00 00 | +| `1.0` | 03 3F F0 00 00 00 00 00 00 | +| `-1.202e300` | 03 FE 3C B7 B7 59 BF 04 26 | Finally, a larger example, using a non-`Symbol` label for a record.[^extensibility2] The `Record` - ([titled person 2 thing 1] - 101 - "Blackwell" - (date 1821 2 3) - "Dr") + [titled person 2 thing 1](101, "Blackwell", date(1821 2 3), "Dr") encodes to @@ -671,16 +819,16 @@ such media types following the general rules for ordering of | Value | Encoded hexadecimal byte sequence | |--------------------------------------------|-------------------------------------------------------------------------------------------------------------------| -| `(mime application/octet-stream #"abcde")` | B3 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 | -| `(mime text/plain #"ABC")` | B3 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 | -| `(mime application/xml #"")` | B3 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E | -| `(mime text/csv #"123,234,345")` | B3 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 | +| `mime(application/octet-stream #"abcde")` | B3 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 | +| `mime(text/plain #"ABC")` | B3 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 | +| `mime(application/xml #"")` | B3 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E | +| `mime(text/csv #"123,234,345")` | B3 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 | Applications making heavy use of `mime` records may choose to use a short form label number for the record type. For example, if short -form label number 1 were chosen, the second example above, `(mime -text/plain "ABC")`, would be encoded with "92" in place of "B3 74 6D -69 6D 65". +form label number 1 were chosen, the second example above, +`mime(text/plain "ABC")`, would be encoded with "92" in place of "B3 +74 6D 69 6D 65". ### Unicode normalization forms @@ -707,13 +855,13 @@ The definition of `SignedInteger` captures all integers. However, in certain circumstances it can be valuable to assert that a number inhabits a particular range, such as a fixed-width machine word. -A family of labels `i`*n* and `u`*n* for *n* ∈ {16,32,64} denote +A family of labels `i`*n* and `u`*n* for *n* ∈ {8,16,32,64} denote *n*-bit-wide signed and unsigned range restrictions, respectively. Records with these labels *MUST* have one field, a `SignedInteger`, which *MUST* fall within the appropriate range. That is, to be valid, - - in `(i16 `*x*`)`, -32768 <= *x* <= 32767. - - in `(u16 `*x*`)`, 0 <= *x* <= 65535. - - in `(i32 `*x*`)`, -2147483648 <= *x* <= 2147483647. + - in `i8(`*x*`)`, -128 <= *x* <= 127. + - in `u8(`*x*`)`, 0 <= *x* <= 255. + - in `i16(`*x*`)`, -32768 <= *x* <= 32767. - etc. ### Anonymous Tuples and Unit @@ -721,15 +869,15 @@ which *MUST* fall within the appropriate range. That is, to be valid, A `Tuple` is a `Record` with label `tuple` and zero or more fields, denoting an anonymous tuple of values. -The 0-ary tuple, `(tuple)`, denotes the empty tuple, sometimes called +The 0-ary tuple, `tuple()`, denotes the empty tuple, sometimes called "unit" or "void" (but *not* e.g. JavaScript's "undefined" value). ### Null and Undefined Tony Hoare's "[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)" -can be represented with the 0-ary `Record` `(null)`. An "undefined" -value can be represented as `(undefined)`. +can be represented with the 0-ary `Record` `null()`. An "undefined" +value can be represented as `undefined()`. ### Dates and Times @@ -741,6 +889,8 @@ or `date-time` productions of ## Security Considerations +TODO: Lots of whitespace is just like lots of empty chunks + **Empty chunks.** Streamed (format C) `String`s, `ByteString`s and `Symbol`s may include chunks of zero length. This opens up a possibility for denial-of-service: an attacker may begin streaming a @@ -751,9 +901,9 @@ chunks that may appear in a stream, and may even supply an optional mode that rejects empty chunks entirely. **Canonical form for cryptographic hashing and signing.** As -specified, the encoding rules for `Value`s do not force canonical -serializations for `Set` or `Dictionary` values. Two serializations of -the same `Value` may yield different binary `Repr`s. +specified, neither the textual nor the compact binary encoding rules +for `Value`s force canonical serializations. Two serializations of the +same `Value` may yield different binary `Repr`s. ## Appendix. Table of lead byte values