Split up spec!

2022-06-18 19:11:08 +02:00 · 2022-06-18 19:11:08 +02:00 · 7d3789e371
parent 1f495eef1e
commit 7d3789e371
6 changed files with 631 additions and 570 deletions
--- a/README.md
+++ b/README.md
@ -6,22 +6,24 @@ no_site_title: true
 ---

 This [repository]({{page.projectpages}}) contains a
-[proposal](preserves.html) and various implementations of *Preserves*,
-a new data model and serialization format in many ways comparable to
-JSON, XML, S-expressions, CBOR, ASN.1 BER, and so on.
+[proposal](preserves.html) and various implementations of *Preserves*, a
+new data model, with associated serialization formats, in many ways
+comparable to JSON, XML, S-expressions, CBOR, ASN.1 BER, and so on.

 ## Core documents

 ### Preserves data model and serialization formats

 Preserves is defined in terms of a syntax-neutral
-[data model and semantics](preserves.html#starting-with-semantics)
+[data model and semantics](preserves.html#semantics)
 which all transfer syntaxes share. This allows trivial, completely
 automatic, perfect-fidelity conversion between syntaxes.

+ - [Preserves specification](preserves.html):
+    - [Preserves semantics and data model](preserves.html#semantics),
+    - [Preserves textual syntax](preserves-text.html), and
+    - [Preserves machine-oriented binary syntax](preserves-binary.html)
 - [Preserves tutorial](TUTORIAL.html)
- - [Preserves specification](preserves.html), including semantics,
-   data model, textual syntax, and compact binary syntax
 - [Canonical Form for Binary Syntax](canonical-binary.html)
 - [Syrup](https://github.com/ocapn/syrup#pseudo-specification), a
   hybrid binary/human-readable syntax for the Preserves data model
--- a/_config.yml
+++ b/_config.yml
@ -13,3 +13,5 @@ defaults:
      layout: page

 title: "Preserves"
+version_date: "June 2022"
+version: "0.6.3"
--- a/canonical-binary.md
+++ b/canonical-binary.md
@ -17,8 +17,8 @@ their *syntax* for equivalence gives the same result as comparing them
 That is, canonical forms are equal if and only if the encoded `Value`s
 are equal.

-This document specifies canonical form for the Preserves compact
-binary syntax.
+This document specifies canonical form for the Preserves [machine-oriented
+binary syntax](preserves-binary.html).

 **Annotations.**
 Annotations *MUST NOT* be present.
--- a/preserves-binary.md
+++ b/preserves-binary.md
@ -0,0 +1,260 @@
+---
+no_site_title: true
+title: "Preserves: Binary Syntax"
+---
+
+Tony Garnock-Jones <tonyg@leastfixedpoint.com>  
+{{ site.version_date }}. Version {{ site.version }}.
+
+  [sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
+  [spki]: http://world.std.com/~cme/html/spki.html
+  [varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
+  [LEB128]: https://en.wikipedia.org/wiki/LEB128
+  [erlang-map]: http://erlang.org/doc/reference_manual/data_types.html#map
+  [abnf]: https://tools.ietf.org/html/rfc7405
+  [canonical]: canonical-binary.html
+
+*Preserves* is a data model, with associated serialization formats. This
+document defines one of those formats: a binary syntax for `Value`s from
+the [Preserves data model](preserves.html) that is easy for computer
+software to read and write. An [equivalent human-readable text
+syntax](preserves-text.html) also exists.
+
+## Machine-Oriented Binary Syntax
+
+A `Repr` is a binary-syntax encoding, or representation, of a `Value`.
+For a value `v`, we write `«v»` for the `Repr` of v.
+
+### Type and Length representation.
+
+Each `Repr` starts with a tag byte, describing the kind of information
+represented. Depending on the tag, a length indicator, further encoded
+information, and/or an ending tag may follow.
+
+    tag                          (simple atomic data and small integers)
+    tag ++ binarydata            (most integers)
+    tag ++ length ++ binarydata  (large integers, strings, symbols, and binary)
+    tag ++ repr ++ ... ++ endtag (compound data)
+
+The unique end tag is byte value `0x84`.
+
+If present after a tag, the length of a following piece of binary data
+is formatted as a [base 128 varint][varint].[^see-also-leb128] We
+write `varint(m)` for the varint-encoding of `m`. Quoting the
+[Google Protocol Buffers][varint] definition,
+
+  [^see-also-leb128]: Also known as [LEB128][] encoding, for unsigned
+    integers. Varints and LEB128-encoded integers differ only for
+    signed integers, which are not used in Preserves.
+
+> Each byte in a varint, except the last byte, has the most
+> significant bit (msb) set – this indicates that there are further
+> bytes to come. The lower 7 bits of each byte are used to store the
+> two's complement representation of the number in groups of 7 bits,
+> least significant group first.
+
+The following table illustrates varint-encoding.
+
+| Number, `m` | `m` in binary, grouped into 7-bit chunks  | `varint(m)` bytes |
+| ------      | -------------------                       | ------------      |
+| 15          | `0001111`                                 | 15                |
+| 300         | `0000010 0101100`                         | 172 2             |
+| 1000000000  | `0000011 1011100 1101011 0010100 0000000` | 128 148 235 220 3 |
+
+It is an error for a varint-encoded `m` in a `Repr` to be anything
+other than the unique shortest encoding for that `m`. That is, a
+varint-encoding of `m` *MUST NOT* end in `0` unless `m`=0.
+
+### Records, Sequences, Sets and Dictionaries.
+
+          «<L F_1...F_m>» = [0xB4] ++ «L» ++ «F_1» ++...++ «F_m» ++ [0x84]
+            «[X_1...X_m]» = [0xB5] ++ «X_1» ++...++ «X_m» ++ [0x84]
+           «#{E_1...E_m}» = [0xB6] ++ «E_1» ++...++ «E_m» ++ [0x84]
+    «{K_1:V_1...K_m:V_m}» = [0xB7] ++ «K_1» ++ «V_1» ++...++ «K_m» ++ «V_m» ++ [0x84]
+
+There is *no* ordering requirement on the `E_i` elements or
+`K_i`/`V_i` pairs.[^no-sorting-rationale] They may appear in any
+order. However, the `E_i` and `K_i` *MUST* be pairwise distinct. In
+addition, implementations *SHOULD* default to writing set elements and
+dictionary key/value pairs in order sorted lexicographically by their
+`Repr`s[^not-sorted-semantically], and *MAY* offer the option of
+serializing in some other implementation-defined order.
+
+  [^no-sorting-rationale]: In the BitTorrent encoding format,
+    [bencoding](http://www.bittorrent.org/beps/bep_0003.html#bencoding),
+    dictionary key/value pairs must be sorted by key. This is a
+    necessary step for ensuring serialization of `Value`s is
+    canonical. We do not require that key/value pairs (or set
+    elements) be in sorted order for serialized `Value`s; however, a
+    [canonical form][canonical] for `Repr`s does exist where a sorted
+    ordering is required.
+
+  [^not-sorted-semantically]: It's important to note that the sort
+    ordering for writing out set elements and dictionary key/value
+    pairs is *not* the same as the sort ordering implied by the
+    semantic ordering of those elements or keys. For example, the
+    `Repr` of a negative number very far from zero will start with
+    byte that is *greater* than the byte which starts the `Repr` of
+    zero, making it sort lexicographically later by `Repr`, despite
+    being semantically *less than* zero.
+
+    **Rationale**. This is for ease-of-implementation reasons: not all
+    languages can easily represent sorted sets or sorted dictionaries,
+    but encoding and then sorting byte strings is much more likely to
+    be within easy reach.
+
+### SignedIntegers.
+
+    «x» when x ∈ SignedInteger = [0xB0] ++ varint(m) ++ intbytes(x)  if ¬(-3≤x≤12) ∧ m>16
+                                 ([0xA0] + m - 1) ++ intbytes(x)     if ¬(-3≤x≤12) ∧ m≤16
+                                 ([0xA0] + x)                        if  (-3≤x≤-1)
+                                 ([0x90] + x)                        if  ( 0≤x≤12)
+                               where m =        |intbytes(x)|
+
+Integers in the range [-3,12] are compactly represented with tags
+between `0x90` and `0x9F` because they are so frequently used.
+Integers up to 16 bytes long are represented with a single-byte tag
+encoding the length of the integer. Larger integers are represented
+with an explicit varint length. Every `SignedInteger` *MUST* be
+represented with its shortest possible encoding.
+
+The function `intbytes(x)` gives the big-endian two's-complement
+binary representation of `x`, taking exactly as many whole bytes as
+needed to unambiguously identify the value and its sign, and `m =
+|intbytes(x)|`. The most-significant bit in the first byte in
+`intbytes(x)` <!-- for `x`≠0 --> is the sign bit.[^zero-intbytes] For
+example,
+
+      «87112285931760246646623899502532662132736»
+        = B0 12 01 00 00 00 00 00 00 00
+                00 00 00 00 00 00 00 00
+                00 00
+
+      «-257» = A1 FE FF        «-3» = 9D          «128» = A1 00 80
+      «-256» = A1 FF 00        «-2» = 9E          «255» = A1 00 FF
+      «-255» = A1 FF 01        «-1» = 9F          «256» = A1 01 00
+      «-254» = A1 FF 02         «0» = 90        «32767» = A1 7F FF
+      «-129» = A1 FF 7F         «1» = 91        «32768» = A2 00 80 00
+      «-128» = A0 80           «12» = 9C        «65535» = A2 00 FF FF
+      «-127» = A0 81           «13» = A0 0D     «65536» = A2 01 00 00
+        «-4» = A0 FC          «127» = A0 7F    «131072» = A2 02 00 00
+
+  [^zero-intbytes]: The value 0 needs zero bytes to identify the
+    value, so `intbytes(0)` is the empty byte string. Non-zero values
+    need at least one byte.
+
+### Strings, ByteStrings and Symbols.
+
+Syntax for these three types varies only in the tag used. For `String`
+and `Symbol`, the data following the tag is a UTF-8 encoding of the
+`Value`'s code points, while for `ByteString` it is the raw data
+contained within the `Value` unmodified.
+
+    «S» = [0xB1] ++ varint(|utf8(S)|) ++ utf8(S)  if S ∈ String
+          [0xB2] ++ varint(|S|) ++ S              if S ∈ ByteString
+          [0xB3] ++ varint(|utf8(S)|) ++ utf8(S)  if S ∈ Symbol
+
+### Booleans.
+
+    «#f» = [0x80]
+    «#t» = [0x81]
+
+### Floats and Doubles.
+
+    «F» when F ∈ Float  = [0x82] ++ binary32(F)
+    «D» when D ∈ Double = [0x83] ++ binary64(D)
+
+The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
+8-byte IEEE 754 binary representations of `F` and `D`, respectively.
+
+### Embeddeds.
+
+The `Repr` of an `Embedded` is the `Repr` of a `Value` chosen to
+represent the denoted object, prefixed with `[0x86]`.
+
+    «#!V» = [0x86] ++ «V»
+
+### Annotations.
+
+To annotate a `Repr` `r` with some `Value` `v`, prepend `r` with
+`[0x85] ++ «v»`. For example, the `Repr` corresponding to textual
+syntax `@a@b[]`, i.e. an empty sequence annotated with two symbols,
+`a` and `b`, is
+
+    «@a @b []»
+      = [0x85] ++ «a» ++ [0x85] ++ «b» ++ «[]»
+      = [0x85, 0xB3, 0x01, 0x61, 0x85, 0xB3, 0x01, 0x62, 0xB5, 0x84]
+
+## Security Considerations
+
+**Annotations.** In modes where a `Value` is being read while
+annotations are skipped, an endless sequence of annotations may give an
+illusion of progress.
+
+**Canonical form for cryptographic hashing and signing.** No canonical
+textual encoding of a `Value` is specified. A
+[canonical form][canonical] exists for binary encoded `Value`s, and
+implementations *SHOULD* produce canonical binary encodings by
+default; however, an implementation *MAY* permit two serializations of
+the same `Value` to yield different binary `Repr`s.
+
+## Appendix. Autodetection of textual or binary syntax
+
+Every tag byte in a binary Preserves `Document` falls within the range
+[`0x80`, `0xBF`]. These bytes, interpreted as UTF-8, are *continuation
+bytes*, and will never occur as the first byte of a UTF-8 encoded code
+point. This means no binary-encoded document can be misinterpreted as
+valid UTF-8.
+
+Conversely, a UTF-8 document must start with a valid codepoint,
+meaning in particular that it must not start with a byte in the range
+[`0x80`, `0xBF`]. This means that no UTF-8 encoded textual-syntax
+Preserves document can be misinterpreted as a binary-syntax document.
+
+Examination of the top two bits of the first byte of a document gives
+its syntax: if the top two bits are `10`, it should be interpreted as
+a binary-syntax document; otherwise, it should be interpreted as text.
+
+## Appendix. Table of tag values
+
+     80 - False
+     81 - True
+     82 - Float
+     83 - Double
+     84 - End marker
+     85 - Annotation
+     86 - Embedded
+    (8x)  RESERVED 87-8F
+
+     9x - Small integers 0..12,-3..-1
+     An - Medium integers, (n+1) bytes long
+     B0 - Large integers, variable length
+     B1 - String
+     B2 - ByteString
+     B3 - Symbol
+
+     B4 - Record
+     B5 - Sequence
+     B6 - Set
+     B7 - Dictionary
+
+## Appendix. Binary SignedInteger representation
+
+Languages that provide fixed-width machine word types may find the
+following table useful in encoding and decoding binary `SignedInteger`
+values.
+
+| Integer range                              | Bytes required | Encoding (hex)                               |
+| ---                                        | ---            | ---                                          |
+| -3 ≤ n ≤ 12                                | 1              | `9X`                                         |
+| -2<sup>7</sup> ≤ n < 2<sup>7</sup> (i8)    | 2              | `A0` `XX`                                    |
+| -2<sup>15</sup> ≤ n < 2<sup>15</sup> (i16) | 3              | `A1` `XX` `XX`                               |
+| -2<sup>23</sup> ≤ n < 2<sup>23</sup> (i24) | 4              | `A2` `XX` `XX` `XX`                          |
+| -2<sup>31</sup> ≤ n < 2<sup>31</sup> (i32) | 5              | `A3` `XX` `XX` `XX` `XX`                     |
+| -2<sup>39</sup> ≤ n < 2<sup>39</sup> (i40) | 6              | `A4` `XX` `XX` `XX` `XX` `XX`                |
+| -2<sup>47</sup> ≤ n < 2<sup>47</sup> (i48) | 7              | `A5` `XX` `XX` `XX` `XX` `XX` `XX`           |
+| -2<sup>55</sup> ≤ n < 2<sup>55</sup> (i56) | 8              | `A6` `XX` `XX` `XX` `XX` `XX` `XX` `XX`      |
+| -2<sup>63</sup> ≤ n < 2<sup>63</sup> (i64) | 9              | `A7` `XX` `XX` `XX` `XX` `XX` `XX` `XX` `XX` |
+
+<!-- Heading to visually offset the footnotes from the main document: -->
+## Notes
--- a/preserves-text.md
+++ b/preserves-text.md
@ -0,0 +1,302 @@
+---
+no_site_title: true
+title: "Preserves: Text Syntax"
+---
+
+Tony Garnock-Jones <tonyg@leastfixedpoint.com>  
+{{ site.version_date }}. Version {{ site.version }}.
+
+  [sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
+  [spki]: http://world.std.com/~cme/html/spki.html
+  [varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
+  [LEB128]: https://en.wikipedia.org/wiki/LEB128
+  [erlang-map]: http://erlang.org/doc/reference_manual/data_types.html#map
+  [abnf]: https://tools.ietf.org/html/rfc7405
+  [canonical]: canonical-binary.html
+
+*Preserves* is a data model, with associated serialization formats. This
+document defines one of those formats: a textual syntax for `Value`s
+from the [Preserves data model](preserves.html) that is easy for people
+to read and write. An [equivalent machine-oriented binary
+syntax](preserves-binary.html) also exists.
+
+## Preliminaries
+
+The definition uses [case-sensitive ABNF][abnf].
+
+ABNF allows easy definition of US-ASCII-based languages. However,
+Preserves is a Unicode-based language. Therefore, we reinterpret ABNF as
+a grammar for recognising sequences of Unicode code points.
+
+**Encoding.** Textual syntax for a `Value` *SHOULD* be encoded using
+UTF-8 where possible.
+
+**Whitespace.** Whitespace is defined as any number of spaces, tabs,
+carriage returns, line feeds, or commas.
+
+                ws = *(%x20 / %x09 / newline / ",")
+           newline = CR / LF
+
+## Grammar
+
+Standalone documents may have trailing whitespace.
+
+          Document = Value ws
+
+Any `Value` may be preceded by whitespace.
+
+             Value = ws (Record / Collection / Atom / Embedded / Machine)
+        Collection = Sequence / Dictionary / Set
+              Atom = Boolean / Float / Double / SignedInteger /
+                     String / ByteString / Symbol
+
+Each `Record` is an angle-bracket enclosed grouping of its
+label-`Value` followed by its field-`Value`s.
+
+            Record = "<" Value *Value ws ">"
+
+`Sequence`s are enclosed in square brackets. `Dictionary` values are
+curly-brace-enclosed colon-separated pairs of values. `Set`s are
+written as values enclosed by the tokens `#{` and
+`}`.[^printing-collections] It is an error for a set to contain
+duplicate elements or for a dictionary to contain duplicate keys.
+
+          Sequence = "[" *Value ws "]"
+        Dictionary = "{" *(Value ws ":" Value) ws "}"
+               Set = "#{" *Value ws "}"
+
+  [^printing-collections]: **Implementation note.** When implementing
+    printing of `Value`s using the textual syntax, consider supporting
+    (a) optional pretty-printing with indentation, (b) optional
+    JSON-compatible print mode for that subset of `Value` that is
+    compatible with JSON, and (c) optional submodes for no commas,
+    commas separating, and commas terminating elements or key/value
+    pairs within a collection.
+
+`Boolean`s are the simple literal strings `#t` and `#f` for true and
+false, respectively.
+
+           Boolean = %s"#t" / %s"#f"
+
+Numeric data follow the
+[JSON grammar](https://tools.ietf.org/html/rfc8259#section-6), with
+the addition of a trailing “f” distinguishing `Float` from `Double`
+values. `Float`s and `Double`s always have either a fractional part or
+an exponent part, where `SignedInteger`s never have
+either.[^reading-and-writing-floats-accurately]
+[^arbitrary-precision-signedinteger]
+
+             Float = flt %i"f"
+            Double = flt
+     SignedInteger = int
+
+          digit1-9 = %x31-39
+               nat = %x30 / ( digit1-9 *DIGIT )
+               int = ["-"] nat
+              frac = "." 1*DIGIT
+               exp = %i"e" ["-"/"+"] 1*DIGIT
+               flt = int (frac exp / frac / exp)
+
+  [^reading-and-writing-floats-accurately]: **Implementation note.**
+    Your language's standard library likely has a good routine for
+    converting between decimal notation and IEEE 754 floating-point.
+    However, if not, or if you are interested in the challenges of
+    accurately reading and writing floating point numbers, see the
+    excellent matched pair of 1990 papers by Clinger and Steele &
+    White, and a recent follow-up by Jaffer:
+
+    Clinger, William D. ‘How to Read Floating Point Numbers
+    Accurately’. In Proc. PLDI. White Plains, New York, 1990.
+    <https://doi.org/10.1145/93542.93557>.
+
+    Steele, Guy L., Jr., and Jon L. White. ‘How to Print
+    Floating-Point Numbers Accurately’. In Proc. PLDI. White Plains,
+    New York, 1990. <https://doi.org/10.1145/93542.93559>.
+
+    Jaffer, Aubrey. ‘Easy Accurate Reading and Writing of
+    Floating-Point Numbers’. ArXiv:1310.8121 [Cs], 27 October 2013.
+    <http://arxiv.org/abs/1310.8121>.
+
+  [^arbitrary-precision-signedinteger]: **Implementation note.** Be
+    aware when implementing reading and writing of `SignedInteger`s
+    that the data model *requires* arbitrary-precision integers. Your
+    implementation may (but, ideally, should not) truncate precision
+    when reading or writing a `SignedInteger`; however, if it does so,
+    it should (a) signal its client that truncation has occurred, and
+    (b) make it clear to the client that comparing such truncated
+    values for equality or ordering will not yield results that match
+    the expected semantics of the data model.
+
+`String`s are,
+[as in JSON](https://tools.ietf.org/html/rfc8259#section-7), possibly
+escaped text surrounded by double quotes. The escaping rules are the
+same as for JSON.[^string-json-correspondence] [^escaping-surrogate-pairs]
+
+            String = %x22 *char %x22
+              char = unescaped / %x7C / escape (escaped / %x22 / %s"u" 4HEXDIG)
+         unescaped = %x20-21 / %x23-5B / %x5D-7B / %x7D-10FFFF
+            escape = %x5C              ; \
+           escaped = ( %x5C /          ; \    reverse solidus U+005C
+                       %x2F /          ; /    solidus         U+002F
+                       %x62 /          ; b    backspace       U+0008
+                       %x66 /          ; f    form feed       U+000C
+                       %x6E /          ; n    line feed       U+000A
+                       %x72 /          ; r    carriage return U+000D
+                       %x74 )          ; t    tab             U+0009
+
+  [^string-json-correspondence]: The grammar for `String` has the same
+    effect as the
+    [JSON](https://tools.ietf.org/html/rfc8259#section-7) grammar for
+    `string`. Some auxiliary definitions (e.g. `escaped`) are lifted
+    largely unmodified from the text of RFC 8259.
+
+  [^escaping-surrogate-pairs]: In particular, note JSON's rules around
+    the use of surrogate pairs for code points not in the Basic
+    Multilingual Plane. We encourage implementations to avoid using
+    `\u` escapes when producing output, and instead to rely on the
+    UTF-8 encoding of the entire document to handle non-ASCII
+    codepoints correctly.
+
+A `ByteString` may be written in any of three different forms.
+
+The first is similar to a `String`, but prepended with a hash sign
+`#`. In addition, only Unicode code points overlapping with printable
+7-bit ASCII are permitted unescaped inside such a `ByteString`; other
+byte values must be escaped by prepending a two-digit hexadecimal
+value with `\x`.
+
+        ByteString = "#" %x22 *binchar %x22
+           binchar = binunescaped / escape (escaped / %x22 / %s"x" 2HEXDIG)
+      binunescaped = %x20-21 / %x23-5B / %x5D-7E
+
+The second is as a sequence of pairs of hexadecimal digits interleaved
+with whitespace and surrounded by `#x"` and `"`.
+
+       ByteString =/ %s"#x" %x22 *(ws / 2HEXDIG) ws %x22
+
+The third is as a sequence of
+[Base64](https://tools.ietf.org/html/rfc4648) characters, interleaved
+with whitespace and surrounded by `#[` and `]`. Plain and URL-safe
+Base64 characters are allowed.
+
+       ByteString =/ "#[" *(ws / base64char) ws "]"
+        base64char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "-" / "_" / "="
+
+A `Symbol` may be written in a “bare” form[^cf-sexp-token] so long as
+it conforms to certain restrictions on the characters appearing in the
+symbol. Alternatively, it may be written in a quoted form. The quoted
+form is much the same as the syntax for `String`s, including embedded
+escape syntax, except using a bar or pipe character (`|`) instead of a
+double quote mark.
+
+            Symbol = symstart *symcont / "|" *symchar "|"
+          symstart = ALPHA / sympunct / symustart
+           symcont = ALPHA / sympunct / symustart / symucont / DIGIT / "-"
+          sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
+                     "?" / "_" / "=" / "+" / "/" / "."
+           symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
+         symustart = <any code point greater than 127 whose Unicode
+                      category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me,
+                      Pc, Po, Sc, Sm, Sk, So, or Co>
+          symucont = <any code point greater than 127 whose Unicode
+                      category is Nd, Nl, No, or Pd>
+
+  [^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt]
+    definition of “token representation”, and with the
+    [R6RS definition of identifiers](http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2.4).
+
+An `Embedded` is written as a `Value` chosen to represent the denoted
+object, prefixed with `#!`.
+
+           Embedded = "#!" Value
+
+Finally, any `Value` may be represented by escaping from the textual
+syntax to the [machine-oriented binary syntax](preserves-binary.html)
+by prefixing a `ByteString` containing the binary representation of the
+`Value` with `#=`.[^rationale-switch-to-binary]
+[^no-literal-binary-in-text] [^machine-value-annotations]
+
+           Machine = "#=" ws ByteString
+
+  [^rationale-switch-to-binary]: **Rationale.** The textual syntax
+    cannot express every `Value`: specifically, it cannot express the
+    several million floating-point NaNs, or the two floating-point
+    Infinities. Since the machine-oriented binary format for `Value`s
+    expresses each `Value` with precision, embedding binary `Value`s
+    solves the problem.
+
+  [^no-literal-binary-in-text]: Every text is ultimately physically
+    stored as bytes; therefore, it might seem possible to escape to the
+    raw form of binary encoding from within a piece of textual syntax.
+    However, while bytes must be involved in any *representation* of
+    text, the text *itself* is logically a sequence of *code points* and
+    is not *intrinsically* a binary structure at all. It would be
+    incoherent to expect to be able to access the representation of the
+    text from within the text itself.
+
+  [^machine-value-annotations]: Any text-syntax annotations preceding
+    the `#` are prepended to any binary-syntax annotations yielded by
+    decoding the `ByteString`.
+
+## Annotations
+
+When written down, a `Value` may have an associated sequence of
+*annotations* carrying “out-of-band” contextual metadata about the
+value. Each annotation is, in turn, a `Value`, and may itself have
+annotations. The ordering of annotations attached to a `Value` is
+significant.
+
+            Value =/ ws "@" Value Value
+
+Each annotation is preceded by `@`; the underlying annotated value
+follows its annotations. Here we extend only the syntactic nonterminal
+named “`Value`” without altering the semantic class of `Value`s.
+
+**Comments.** Strings annotating a `Value` are conventionally
+interpreted as comments associated with that value. Comments are
+sufficiently common that special syntax exists for them.
+
+            Value =/ ws
+                     ";" *(%x00-09 / %x0B-0C / %x0E-10FFFF) newline
+                     Value
+
+When written this way, everything between the `;` and the newline is
+included in the string annotating the `Value`.
+
+**Equivalence.** Annotations appear within syntax denoting a `Value`;
+however, the annotations are not part of the denoted value. They are
+only part of the syntax. Annotations do not play a part in
+equivalences and orderings of `Value`s.
+
+Reflective tools such as debuggers, user interfaces, and message
+routers and relays---tools which process `Value`s generically---may
+use annotated inputs to tailor their operation, or may insert
+annotations in their outputs. By contrast, in ordinary programs, as a
+rule of thumb, the presence, absence or content of an annotation
+should not change the control flow or output of the program.
+Annotations are data *describing* `Value`s, and are not in the domain
+of any specific application of `Value`s. That is, an annotation will
+almost never cause a non-reflective program to do anything observably
+different.
+
+## Security Considerations
+
+**Whitespace.** The textual format allows arbitrary whitespace in many
+positions. Consider optional restrictions on the amount of consecutive
+whitespace that may appear.
+
+**Annotations.** Similarly, in modes where a `Value` is being read
+while annotations are skipped, an endless sequence of annotations may
+give an illusion of progress.
+
+## Acknowledgements
+
+The treatment of commas as whitespace in the text syntax is inspired
+by the same feature of [EDN](https://github.com/edn-format/edn).
+
+The text syntax for `Boolean`s, `Symbol`s, and `ByteString`s is
+directly inspired by [Racket](https://racket-lang.org/)'s lexical
+syntax.
+
+<!-- Heading to visually offset the footnotes from the main document: -->
+## Notes
--- a/preserves.md
+++ b/preserves.md
@ -4,7 +4,7 @@ title: "Preserves: an Expressive Data Language"
 ---

 Tony Garnock-Jones <tonyg@leastfixedpoint.com>  
-January 2022. Version 0.6.2.
+{{ site.version_date }}. Version {{ site.version }}.

  [sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
  [spki]: http://world.std.com/~cme/html/spki.html
@ -14,29 +14,35 @@ January 2022. Version 0.6.2.
  [abnf]: https://tools.ietf.org/html/rfc7405
  [canonical]: canonical-binary.html

-This document proposes a data model and serialization format called
-*Preserves*.
+*Preserves* is a data model, with associated serialization formats.

-Preserves supports *records* with user-defined *labels*, embedded
-*references*, and the usual suite of atomic and compound data types,
-including *binary* data as a distinct type from text strings. Its
-*annotations* allow separation of data from metadata such as
-[comments](conventions.html#comments), trace information, and
-provenance information.
+It supports *records* with user-defined *labels*, embedded *references*,
+and the usual suite of atomic and compound data types, including
+*binary* data as a distinct type from text strings. Its *annotations*
+allow separation of data from metadata such as
+[comments](conventions.html#comments), trace information, and provenance
+information.

 Preserves departs from many other data languages in defining how to
 *compare* two values. Comparison is based on the data model, not on
 syntax or on data structures of any particular implementation
 language.

-## Starting with Semantics
+This document defines the core semantics and data model of Preserves and
+presents a handful of examples. Two other core documents define

-Taking inspiration from functional programming, we start with a
-definition of the *values* that we want to work with and give them
-meaning independent of their syntax.
+ - a [human-readable text syntax](preserves-text.html), and
+ - a [machine-oriented binary syntax](preserves-binary.html)

-<a id="values"></a>
-Our `Value`s fall into two broad categories: *atomic* and *compound*
+for the Preserves data model.
+
+## <a id="semantics"></a><a id="starting-with-semantics"></a>Values
+
+Preserves *values* are given meaning independent of their syntax. We
+will write "`Value`" when we mean the set of all Preserves values or an
+element of that set.
+
+`Value`s fall into two broad categories: *atomic* and *compound*
 data. Every `Value` is finite and non-cyclic. Embedded values, called
 `Embedded`s, are a third, special-case category.

@ -76,20 +82,23 @@ neither is less than the other according to the total order.

 ### Signed integers.

-A `SignedInteger` is a signed integer of arbitrary width.
+A `SignedInteger` is an arbitrarily-large signed integer.
 `SignedInteger`s are compared as mathematical integers.

 ### Unicode strings.

 A `String` is a sequence of Unicode
-[code-point](http://www.unicode.org/glossary/#code_point)s. `String`s
-are compared lexicographically, code-point by
+[code-point](http://www.unicode.org/glossary/#code_point)s.[^nul-permitted]
+`String`s are compared lexicographically, code-point by
 code-point.[^utf8-is-awesome]

  [^utf8-is-awesome]: Happily, the design of UTF-8 is such that this
    gives the same result as a lexicographic byte-by-byte comparison
    of the UTF-8 encoding of a string!

+  [^nul-permitted]: All Unicode code-points are permitted, including NUL
+    (code point zero).
+
 ### Binary data.

 A `ByteString` is a sequence of octets. `ByteString`s are compared
@ -111,11 +120,11 @@ less-than the “true” value.

 `Float`s and `Double`s are single- and double-precision IEEE 754
 floating-point values, respectively. `Float`s, `Double`s and
-`SignedInteger`s are disjoint; by the rules [above](#total-order),
-every `Float` is less than every `Double`, and every `SignedInteger`
-is greater than both. Two `Float`s or two `Double`s are to be ordered
-by the `totalOrder` predicate defined in section 5.10 of
-[IEEE Std 754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935).
+`SignedInteger`s are disjoint; by the rules [above](#total-order), every
+`Float` is less than every `Double`, and every `SignedInteger` is
+greater than both. Two `Float`s or two `Double`s are to be ordered by
+the `totalOrder` predicate defined in section 5.10 of [IEEE Std
+754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935).

 ### Records.

@ -200,457 +209,13 @@ URL, compared according to
 usually be represented as ordinary `Value`s, in which case the
 ordinary rules for comparing `Value`s will apply.

-## Textual Syntax
-
-Now we have discussed `Value`s and their meanings, we may turn to
-techniques for *representing* `Value`s for communication or storage.
-
-In this section, we use [case-sensitive ABNF][abnf] to define a
-textual syntax that is easy for people to read and
-write.[^json-superset] Most of the examples in this document are
-written using this syntax. In the following section, we will define an
-equivalent compact machine-readable syntax.
-
-  [^json-superset]: The grammar of the textual syntax is a superset of
-    JSON, with the slightly unusual feature that `true`, `false`, and
-    `null` are all read as `Symbol`s, and that `SignedInteger`s are
-    never read as `Double`s.
-
-    The following [schema](./preserves-schema.html) definitions match
-    exactly the JSON subset of a Preserves input:
-
-        version 1 .
-        JSON = @string string / @integer int / @double double / @boolean JSONBoolean / @null =null
-             / @array [JSON ...] / @object { string: JSON ...:... } .
-        JSONBoolean = =true / =false .
-
-### Character set.
-
-[ABNF][abnf] allows easy definition of US-ASCII-based languages.
-However, Preserves is a Unicode-based language. Therefore, we
-reinterpret ABNF as a grammar for recognising sequences of Unicode
-code points.
-
-Textual syntax for a `Value` *SHOULD* be encoded using UTF-8 where
-possible.
-
-### Whitespace.
-
-Whitespace is defined as any number of spaces, tabs, carriage returns,
-line feeds, or commas.
-
-                ws = *(%x20 / %x09 / newline / ",")
-           newline = CR / LF
-
-### Grammar.
-
-Standalone documents may have trailing whitespace.
-
-          Document = Value ws
-
-Any `Value` may be preceded by whitespace.
-
-             Value = ws (Record / Collection / Atom / Embedded / Compact)
-        Collection = Sequence / Dictionary / Set
-              Atom = Boolean / Float / Double / SignedInteger /
-                     String / ByteString / Symbol
-
-Each `Record` is an angle-bracket enclosed grouping of its
-label-`Value` followed by its field-`Value`s.
-
-            Record = "<" Value *Value ws ">"
-
-`Sequence`s are enclosed in square brackets. `Dictionary` values are
-curly-brace-enclosed colon-separated pairs of values. `Set`s are
-written as values enclosed by the tokens `#{` and
-`}`.[^printing-collections] It is an error for a set to contain
-duplicate elements or for a dictionary to contain duplicate keys.
-
-          Sequence = "[" *Value ws "]"
-        Dictionary = "{" *(Value ws ":" Value) ws "}"
-               Set = "#{" *Value ws "}"
-
-  [^printing-collections]: **Implementation note.** When implementing
-    printing of `Value`s using the textual syntax, consider supporting
-    (a) optional pretty-printing with indentation, (b) optional
-    JSON-compatible print mode for that subset of `Value` that is
-    compatible with JSON, and (c) optional submodes for no commas,
-    commas separating, and commas terminating elements or key/value
-    pairs within a collection.
-
-`Boolean`s are the simple literal strings `#t` and `#f` for true and
-false, respectively.
-
-           Boolean = %s"#t" / %s"#f"
-
-Numeric data follow the
-[JSON grammar](https://tools.ietf.org/html/rfc8259#section-6), with
-the addition of a trailing “f” distinguishing `Float` from `Double`
-values. `Float`s and `Double`s always have either a fractional part or
-an exponent part, where `SignedInteger`s never have
-either.[^reading-and-writing-floats-accurately]
-[^arbitrary-precision-signedinteger]
-
-             Float = flt %i"f"
-            Double = flt
-     SignedInteger = int
-
-          digit1-9 = %x31-39
-               nat = %x30 / ( digit1-9 *DIGIT )
-               int = ["-"] nat
-              frac = "." 1*DIGIT
-               exp = %i"e" ["-"/"+"] 1*DIGIT
-               flt = int (frac exp / frac / exp)
-
-  [^reading-and-writing-floats-accurately]: **Implementation note.**
-    Your language's standard library likely has a good routine for
-    converting between decimal notation and IEEE 754 floating-point.
-    However, if not, or if you are interested in the challenges of
-    accurately reading and writing floating point numbers, see the
-    excellent matched pair of 1990 papers by Clinger and Steele &
-    White, and a recent follow-up by Jaffer:
-
-    Clinger, William D. ‘How to Read Floating Point Numbers
-    Accurately’. In Proc. PLDI. White Plains, New York, 1990.
-    <https://doi.org/10.1145/93542.93557>.
-
-    Steele, Guy L., Jr., and Jon L. White. ‘How to Print
-    Floating-Point Numbers Accurately’. In Proc. PLDI. White Plains,
-    New York, 1990. <https://doi.org/10.1145/93542.93559>.
-
-    Jaffer, Aubrey. ‘Easy Accurate Reading and Writing of
-    Floating-Point Numbers’. ArXiv:1310.8121 [Cs], 27 October 2013.
-    <http://arxiv.org/abs/1310.8121>.
-
-  [^arbitrary-precision-signedinteger]: **Implementation note.** Be
-    aware when implementing reading and writing of `SignedInteger`s
-    that the data model *requires* arbitrary-precision integers. Your
-    implementation may (but, ideally, should not) truncate precision
-    when reading or writing a `SignedInteger`; however, if it does so,
-    it should (a) signal its client that truncation has occurred, and
-    (b) make it clear to the client that comparing such truncated
-    values for equality or ordering will not yield results that match
-    the expected semantics of the data model.
-
-`String`s are,
-[as in JSON](https://tools.ietf.org/html/rfc8259#section-7), possibly
-escaped text surrounded by double quotes. The escaping rules are the
-same as for JSON.[^string-json-correspondence] [^escaping-surrogate-pairs]
-
-            String = %x22 *char %x22
-              char = unescaped / %x7C / escape (escaped / %x22 / %s"u" 4HEXDIG)
-         unescaped = %x20-21 / %x23-5B / %x5D-7B / %x7D-10FFFF
-            escape = %x5C              ; \
-           escaped = ( %x5C /          ; \    reverse solidus U+005C
-                       %x2F /          ; /    solidus         U+002F
-                       %x62 /          ; b    backspace       U+0008
-                       %x66 /          ; f    form feed       U+000C
-                       %x6E /          ; n    line feed       U+000A
-                       %x72 /          ; r    carriage return U+000D
-                       %x74 )          ; t    tab             U+0009
-
-  [^string-json-correspondence]: The grammar for `String` has the same
-    effect as the
-    [JSON](https://tools.ietf.org/html/rfc8259#section-7) grammar for
-    `string`. Some auxiliary definitions (e.g. `escaped`) are lifted
-    largely unmodified from the text of RFC 8259.
-
-  [^escaping-surrogate-pairs]: In particular, note JSON's rules around
-    the use of surrogate pairs for code points not in the Basic
-    Multilingual Plane. We encourage implementations to avoid using
-    `\u` escapes when producing output, and instead to rely on the
-    UTF-8 encoding of the entire document to handle non-ASCII
-    codepoints correctly.
-
-A `ByteString` may be written in any of three different forms.
-
-The first is similar to a `String`, but prepended with a hash sign
-`#`. In addition, only Unicode code points overlapping with printable
-7-bit ASCII are permitted unescaped inside such a `ByteString`; other
-byte values must be escaped by prepending a two-digit hexadecimal
-value with `\x`.
-
-        ByteString = "#" %x22 *binchar %x22
-           binchar = binunescaped / escape (escaped / %x22 / %s"x" 2HEXDIG)
-      binunescaped = %x20-21 / %x23-5B / %x5D-7E
-
-The second is as a sequence of pairs of hexadecimal digits interleaved
-with whitespace and surrounded by `#x"` and `"`.
-
-       ByteString =/ %s"#x" %x22 *(ws / 2HEXDIG) ws %x22
-
-The third is as a sequence of
-[Base64](https://tools.ietf.org/html/rfc4648) characters, interleaved
-with whitespace and surrounded by `#[` and `]`. Plain and URL-safe
-Base64 characters are allowed.
-
-       ByteString =/ "#[" *(ws / base64char) ws "]"
-        base64char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "-" / "_" / "="
-
-A `Symbol` may be written in a “bare” form[^cf-sexp-token] so long as
-it conforms to certain restrictions on the characters appearing in the
-symbol. Alternatively, it may be written in a quoted form. The quoted
-form is much the same as the syntax for `String`s, including embedded
-escape syntax, except using a bar or pipe character (`|`) instead of a
-double quote mark.
-
-            Symbol = symstart *symcont / "|" *symchar "|"
-          symstart = ALPHA / sympunct / symustart
-           symcont = ALPHA / sympunct / symustart / symucont / DIGIT / "-"
-          sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
-                     "?" / "_" / "=" / "+" / "/" / "."
-           symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
-         symustart = <any code point greater than 127 whose Unicode
-                      category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me,
-                      Pc, Po, Sc, Sm, Sk, So, or Co>
-          symucont = <any code point greater than 127 whose Unicode
-                      category is Nd, Nl, No, or Pd>
-
-  [^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt]
-    definition of “token representation”, and with the
-    [R6RS definition of identifiers](http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2.4).
-
-An `Embedded` is written as a `Value` chosen to represent the denoted
-object, prefixed with `#!`.
-
-           Embedded = "#!" Value
-
-Finally, any `Value` may be represented by escaping from the textual
-syntax to the [compact binary syntax](#compact-binary-syntax) by
-prefixing a `ByteString` containing the binary representation of the
-`Value` with `#=`.[^rationale-switch-to-binary]
-[^no-literal-binary-in-text] [^compact-value-annotations]
-
-           Compact = "#=" ws ByteString
-
-  [^rationale-switch-to-binary]: **Rationale.** The textual syntax
-    cannot express every `Value`: specifically, it cannot express the
-    several million floating-point NaNs, or the two floating-point
-    Infinities. Since the compact binary format for `Value`s expresses
-    each `Value` with precision, embedding binary `Value`s solves the
-    problem.
-
-  [^no-literal-binary-in-text]: Every text is ultimately physically
-    stored as bytes; therefore, it might seem possible to escape to
-    the raw binary form of compact binary encoding from within a
-    pieces of textual syntax. However, while bytes must be involved in
-    any *representation* of text, the text *itself* is logically a
-    sequence of *code points* and is not *intrinsically* a binary
-    structure at all. It would be incoherent to expect to be able to
-    access the representation of the text from within the text itself.
-
-  [^compact-value-annotations]: Any text-syntax annotations preceding
-    the `#` are prepended to any binary-syntax annotations yielded by
-    decoding the `ByteString`.
-
-### Annotations.
-
-**Syntax.** When written down, a `Value` may have an associated
-sequence of *annotations* carrying “out-of-band” contextual metadata
-about the value. Each annotation is, in turn, a `Value`, and may
-itself have annotations. The ordering of annotations attached to a
-`Value` is significant.
-
-            Value =/ ws "@" Value Value
-
-Each annotation is preceded by `@`; the underlying annotated value
-follows its annotations. Here we extend only the syntactic nonterminal
-named “`Value`” without altering the semantic class of `Value`s.
-
-**Comments.** Strings annotating a `Value` are conventionally
-interpreted as comments associated with that value. Comments are
-sufficiently common that special syntax exists for them.
-
-            Value =/ ws
-                     ";" *(%x00-09 / %x0B-0C / %x0E-10FFFF) newline
-                     Value
-
-When written this way, everything between the `;` and the newline is
-included in the string annotating the `Value`.
-
-**Equivalence.** Annotations appear within syntax denoting a `Value`;
-however, the annotations are not part of the denoted value. They are
-only part of the syntax. Annotations do not play a part in
-equivalences and orderings of `Value`s.
-
-Reflective tools such as debuggers, user interfaces, and message
-routers and relays---tools which process `Value`s generically---may
-use annotated inputs to tailor their operation, or may insert
-annotations in their outputs. By contrast, in ordinary programs, as a
-rule of thumb, the presence, absence or content of an annotation
-should not change the control flow or output of the program.
-Annotations are data *describing* `Value`s, and are not in the domain
-of any specific application of `Value`s. That is, an annotation will
-almost never cause a non-reflective program to do anything observably
-different.
-
-## Compact Binary Syntax
-
-A `Repr` is a binary-syntax encoding, or representation, of a `Value`.
-For a value `v`, we write `«v»` for the `Repr` of v.
-
-### Type and Length representation.
-
-Each `Repr` starts with a tag byte, describing the kind of information
-represented. Depending on the tag, a length indicator, further encoded
-information, and/or an ending tag may follow.
-
-    tag                          (simple atomic data and small integers)
-    tag ++ binarydata            (most integers)
-    tag ++ length ++ binarydata  (large integers, strings, symbols, and binary)
-    tag ++ repr ++ ... ++ endtag (compound data)
-
-The unique end tag is byte value `0x84`.
-
-If present after a tag, the length of a following piece of binary data
-is formatted as a [base 128 varint][varint].[^see-also-leb128] We
-write `varint(m)` for the varint-encoding of `m`. Quoting the
-[Google Protocol Buffers][varint] definition,
-
-  [^see-also-leb128]: Also known as [LEB128][] encoding, for unsigned
-    integers. Varints and LEB128-encoded integers differ only for
-    signed integers, which are not used in Preserves.
-
-> Each byte in a varint, except the last byte, has the most
-> significant bit (msb) set – this indicates that there are further
-> bytes to come. The lower 7 bits of each byte are used to store the
-> two's complement representation of the number in groups of 7 bits,
-> least significant group first.
-
-The following table illustrates varint-encoding.
-
-| Number, `m` | `m` in binary, grouped into 7-bit chunks  | `varint(m)` bytes |
-| ------      | -------------------                       | ------------      |
-| 15          | `0001111`                                 | 15                |
-| 300         | `0000010 0101100`                         | 172 2             |
-| 1000000000  | `0000011 1011100 1101011 0010100 0000000` | 128 148 235 220 3 |
-
-It is an error for a varint-encoded `m` in a `Repr` to be anything
-other than the unique shortest encoding for that `m`. That is, a
-varint-encoding of `m` *MUST NOT* end in `0` unless `m`=0.
-
-### Records, Sequences, Sets and Dictionaries.
-
-          «<L F_1...F_m>» = [0xB4] ++ «L» ++ «F_1» ++...++ «F_m» ++ [0x84]
-            «[X_1...X_m]» = [0xB5] ++ «X_1» ++...++ «X_m» ++ [0x84]
-           «#{E_1...E_m}» = [0xB6] ++ «E_1» ++...++ «E_m» ++ [0x84]
-    «{K_1:V_1...K_m:V_m}» = [0xB7] ++ «K_1» ++ «V_1» ++...++ «K_m» ++ «V_m» ++ [0x84]
-
-There is *no* ordering requirement on the `E_i` elements or
-`K_i`/`V_i` pairs.[^no-sorting-rationale] They may appear in any
-order. However, the `E_i` and `K_i` *MUST* be pairwise distinct. In
-addition, implementations *SHOULD* default to writing set elements and
-dictionary key/value pairs in order sorted lexicographically by their
-`Repr`s[^not-sorted-semantically], and *MAY* offer the option of
-serializing in some other implementation-defined order.
-
-  [^no-sorting-rationale]: In the BitTorrent encoding format,
-    [bencoding](http://www.bittorrent.org/beps/bep_0003.html#bencoding),
-    dictionary key/value pairs must be sorted by key. This is a
-    necessary step for ensuring serialization of `Value`s is
-    canonical. We do not require that key/value pairs (or set
-    elements) be in sorted order for serialized `Value`s; however, a
-    [canonical form][canonical] for `Repr`s does exist where a sorted
-    ordering is required.
-
-  [^not-sorted-semantically]: It's important to note that the sort
-    ordering for writing out set elements and dictionary key/value
-    pairs is *not* the same as the sort ordering implied by the
-    semantic ordering of those elements or keys. For example, the
-    `Repr` of a negative number very far from zero will start with
-    byte that is *greater* than the byte which starts the `Repr` of
-    zero, making it sort lexicographically later by `Repr`, despite
-    being semantically *less than* zero.
-
-    **Rationale**. This is for ease-of-implementation reasons: not all
-    languages can easily represent sorted sets or sorted dictionaries,
-    but encoding and then sorting byte strings is much more likely to
-    be within easy reach.
-
-### SignedIntegers.
-
-    «x» when x ∈ SignedInteger = [0xB0] ++ varint(m) ++ intbytes(x)  if ¬(-3≤x≤12) ∧ m>16
-                                 ([0xA0] + m - 1) ++ intbytes(x)     if ¬(-3≤x≤12) ∧ m≤16
-                                 ([0xA0] + x)                        if  (-3≤x≤-1)
-                                 ([0x90] + x)                        if  ( 0≤x≤12)
-                               where m =        |intbytes(x)|
-
-Integers in the range [-3,12] are compactly represented with tags
-between `0x90` and `0x9F` because they are so frequently used.
-Integers up to 16 bytes long are represented with a single-byte tag
-encoding the length of the integer. Larger integers are represented
-with an explicit varint length. Every `SignedInteger` *MUST* be
-represented with its shortest possible encoding.
-
-The function `intbytes(x)` gives the big-endian two's-complement
-binary representation of `x`, taking exactly as many whole bytes as
-needed to unambiguously identify the value and its sign, and `m =
-|intbytes(x)|`. The most-significant bit in the first byte in
-`intbytes(x)` <!-- for `x`≠0 --> is the sign bit.[^zero-intbytes] For
-example,
-
-      «87112285931760246646623899502532662132736»
-        = B0 12 01 00 00 00 00 00 00 00
-                00 00 00 00 00 00 00 00
-                00 00
-
-      «-257» = A1 FE FF        «-3» = 9D          «128» = A1 00 80
-      «-256» = A1 FF 00        «-2» = 9E          «255» = A1 00 FF
-      «-255» = A1 FF 01        «-1» = 9F          «256» = A1 01 00
-      «-254» = A1 FF 02         «0» = 90        «32767» = A1 7F FF
-      «-129» = A1 FF 7F         «1» = 91        «32768» = A2 00 80 00
-      «-128» = A0 80           «12» = 9C        «65535» = A2 00 FF FF
-      «-127» = A0 81           «13» = A0 0D     «65536» = A2 01 00 00
-        «-4» = A0 FC          «127» = A0 7F    «131072» = A2 02 00 00
-
-  [^zero-intbytes]: The value 0 needs zero bytes to identify the
-    value, so `intbytes(0)` is the empty byte string. Non-zero values
-    need at least one byte.
-
-### Strings, ByteStrings and Symbols.
-
-Syntax for these three types varies only in the tag used. For `String`
-and `Symbol`, the data following the tag is a UTF-8 encoding of the
-`Value`'s code points, while for `ByteString` it is the raw data
-contained within the `Value` unmodified.
-
-    «S» = [0xB1] ++ varint(|utf8(S)|) ++ utf8(S)  if S ∈ String
-          [0xB2] ++ varint(|S|) ++ S              if S ∈ ByteString
-          [0xB3] ++ varint(|utf8(S)|) ++ utf8(S)  if S ∈ Symbol
-
-### Booleans.
-
-    «#f» = [0x80]
-    «#t» = [0x81]
-
-### Floats and Doubles.
-
-    «F» when F ∈ Float  = [0x82] ++ binary32(F)
-    «D» when D ∈ Double = [0x83] ++ binary64(D)
-
-The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
-8-byte IEEE 754 binary representations of `F` and `D`, respectively.
-
-### Embeddeds.
-
-The `Repr` of an `Embedded` is the `Repr` of a `Value` chosen to
-represent the denoted object, prefixed with `[0x86]`.
-
-    «#!V» = [0x86] ++ «V»
-
-### Annotations.
-
-To annotate a `Repr` `r` with some `Value` `v`, prepend `r` with
-`[0x85] ++ «v»`. For example, the `Repr` corresponding to textual
-syntax `@a@b[]`, i.e. an empty sequence annotated with two symbols,
-`a` and `b`, is
-
-    «@a @b []»
-      = [0x85] ++ «a» ++ [0x85] ++ «b» ++ «[]»
-      = [0x85, 0xB3, 0x01, 0x61, 0x85, 0xB3, 0x01, 0x62, 0xB5, 0x84]
-
 ## Examples

+The definitions above are independent of any particular concrete syntax.
+The examples of `Value`s that follow are written using [the Preserves
+text syntax](preserves-text.html), and the example encoded byte
+sequences use [the Preserves binary encoding](preserves-binary.html).
+
 ### Ordering.

 The total ordering specified [above](#total-order) means that the following statements are true:
@ -720,10 +285,23 @@ encodes to

 ### JSON examples.

-The examples from
-[RFC 8259](https://tools.ietf.org/html/rfc8259#section-13) read as
-valid Preserves, though the JSON literals `true`, `false` and `null`
-read as `Symbol`s. The first example:
+Preserves text syntax is a superset of JSON, so the examples from [RFC
+8259](https://tools.ietf.org/html/rfc8259#section-13) read as valid
+Preserves.
+
+The JSON literals `true`, `false` and `null` all read as `Symbol`s, and
+JSON numbers read (unambiguously) either as `SignedInteger`s or as
+`Double`s.[^json-superset]
+
+  [^json-superset]: The following [schema](./preserves-schema.html)
+    definitions match exactly the JSON subset of a Preserves input:
+
+        version 1 .
+        JSON = @string string / @integer int / @double double / @boolean JSONBoolean / @null =null
+             / @array [JSON ...] / @object { string: JSON ...:... } .
+        JSONBoolean = =true / =false .
+
+The first RFC 8259 example:

    {
      "Image": {
@ -740,7 +318,8 @@ read as `Symbol`s. The first example:
        }
    }

-encodes to binary as follows:
+when read using the Preserves text syntax encodes via the binary syntax
+as follows:

    B7
      B1 05 "Image"
@ -764,7 +343,7 @@ encodes to binary as follows:
      84
    84

-and the second example:
+The second RFC 8259 example:

    [
      {
@ -814,89 +393,5 @@ encodes to binary as follows:
      84
    84

-## Security Considerations
-
-**Whitespace.** The textual format allows arbitrary whitespace in many
-positions. Consider optional restrictions on the amount of consecutive
-whitespace that may appear.
-
-**Annotations.** Similarly, in modes where a `Value` is being read
-while annotations are skipped, an endless sequence of annotations may
-give an illusion of progress.
-
-**Canonical form for cryptographic hashing and signing.** No canonical
-textual encoding of a `Value` is specified. A
-[canonical form][canonical] exists for binary encoded `Value`s, and
-implementations *SHOULD* produce canonical binary encodings by
-default; however, an implementation *MAY* permit two serializations of
-the same `Value` to yield different binary `Repr`s.
-
-## Acknowledgements
-
-The treatment of commas as whitespace in the text syntax is inspired
-by the same feature of [EDN](https://github.com/edn-format/edn).
-
-The text syntax for `Boolean`s, `Symbol`s, and `ByteString`s is
-directly inspired by [Racket](https://racket-lang.org/)'s lexical
-syntax.
-
-## Appendix. Autodetection of textual or binary syntax
-
-Every tag byte in a binary Preserves `Document` falls within the range
-[`0x80`, `0xBF`]. These bytes, interpreted as UTF-8, are *continuation
-bytes*, and will never occur as the first byte of a UTF-8 encoded code
-point. This means no binary-encoded document can be misinterpreted as
-valid UTF-8.
-
-Conversely, a UTF-8 document must start with a valid codepoint,
-meaning in particular that it must not start with a byte in the range
-[`0x80`, `0xBF`]. This means that no UTF-8 encoded textual-syntax
-Preserves document can be misinterpreted as a binary-syntax document.
-
-Examination of the top two bits of the first byte of a document gives
-its syntax: if the top two bits are `10`, it should be interpreted as
-a binary-syntax document; otherwise, it should be interpreted as text.
-
-## Appendix. Table of tag values
-
-     80 - False
-     81 - True
-     82 - Float
-     83 - Double
-     84 - End marker
-     85 - Annotation
-     86 - Embedded
-    (8x)  RESERVED 87-8F
-
-     9x - Small integers 0..12,-3..-1
-     An - Medium integers, (n+1) bytes long
-     B0 - Large integers, variable length
-     B1 - String
-     B2 - ByteString
-     B3 - Symbol
-
-     B4 - Record
-     B5 - Sequence
-     B6 - Set
-     B7 - Dictionary
-
-## Appendix. Binary SignedInteger representation
-
-Languages that provide fixed-width machine word types may find the
-following table useful in encoding and decoding binary `SignedInteger`
-values.
-
-| Integer range                              | Bytes required | Encoding (hex)                               |
-| ---                                        | ---            | ---                                          |
-| -3 ≤ n ≤ 12                                | 1              | `9X`                                         |
-| -2<sup>7</sup> ≤ n < 2<sup>7</sup> (i8)    | 2              | `A0` `XX`                                    |
-| -2<sup>15</sup> ≤ n < 2<sup>15</sup> (i16) | 3              | `A1` `XX` `XX`                               |
-| -2<sup>23</sup> ≤ n < 2<sup>23</sup> (i24) | 4              | `A2` `XX` `XX` `XX`                          |
-| -2<sup>31</sup> ≤ n < 2<sup>31</sup> (i32) | 5              | `A3` `XX` `XX` `XX` `XX`                     |
-| -2<sup>39</sup> ≤ n < 2<sup>39</sup> (i40) | 6              | `A4` `XX` `XX` `XX` `XX` `XX`                |
-| -2<sup>47</sup> ≤ n < 2<sup>47</sup> (i48) | 7              | `A5` `XX` `XX` `XX` `XX` `XX` `XX`           |
-| -2<sup>55</sup> ≤ n < 2<sup>55</sup> (i56) | 8              | `A6` `XX` `XX` `XX` `XX` `XX` `XX` `XX`      |
-| -2<sup>63</sup> ≤ n < 2<sup>63</sup> (i64) | 9              | `A7` `XX` `XX` `XX` `XX` `XX` `XX` `XX` `XX` |
-
 <!-- Heading to visually offset the footnotes from the main document: -->
 ## Notes