New "blue jelly" machine-oriented binary syntax, inspired by argdata

2022-06-10 17:33:52 +02:00 · 2022-06-10 17:33:52 +02:00 · 7055a6467c
parent 4528100248
commit 7055a6467c
5 changed files with 220 additions and 204 deletions
--- a/_config.yml
+++ b/_config.yml
@ -14,4 +14,4 @@ defaults:

 title: "Preserves"
 version_date: "June 2022"
-version: "0.6.3"
+version: "0.7.0"
--- a/preserves-binary.md
+++ b/preserves-binary.md
@ -6,9 +6,11 @@ title: "Preserves: Binary Syntax"
 Tony Garnock-Jones <tonyg@leastfixedpoint.com>  
 {{ site.version_date }}. Version {{ site.version }}.

-  [varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
  [LEB128]: https://en.wikipedia.org/wiki/LEB128
+  [argdata]: https://github.com/NuxiNL/argdata
  [canonical]: canonical-binary.html
+  [google-varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
+  [vlq]: https://en.wikipedia.org/wiki/Variable-length_quantity

 *Preserves* is a data model, with associated serialization formats. This
 document defines one of those formats: a binary syntax for `Value`s from
@ -24,49 +26,52 @@ For a value `v`, we write `«v»` for the `Repr` of v.
 ### Type and Length representation.

 Each `Repr` starts with a tag byte, describing the kind of information
-represented. Depending on the tag, a length indicator, further encoded
-information, and/or an ending tag may follow.
+represented.

-    tag                          (simple atomic data and small integers)
-    tag ++ binarydata            (most integers)
-    tag ++ length ++ binarydata  (large integers, strings, symbols, and binary)
-    tag ++ repr ++ ... ++ endtag (compound data)
+However, inspired by [argdata][], a `Repr` does *not* describe its own
+length. Instead, the surrounding context must supply the length of the
+`Repr`.

-The unique end tag is byte value `0x84`.
+As a consequence, `Repr`s for `Compound` values store the lengths of
+their contained values. Each contained `Value` is represented as a
+length in bytes followed by its own `Repr`.

-If present after a tag, the length of a following piece of binary data
-is formatted as a [base 128 varint][varint].[^see-also-leb128] We
-write `varint(m)` for the varint-encoding of `m`. Quoting the
-[Google Protocol Buffers][varint] definition,
+<a id="varint"></a> Each length is stored as an [argdata][]-compatible
+big-endian base 128 *varint*.[^see-also-leb128] Each byte of a varint
+stores seven bits of the length. All bytes have a clear upper bit,
+except the final byte, which has the upper bit set. We write
+`len(m)` for the varint-encoding of a non-negative integer `m`,
+defined recursively as follows:

-  [^see-also-leb128]: Also known as [LEB128][] encoding, for unsigned
-    integers. Varints and LEB128-encoded integers differ only for
-    signed integers, which are not used in Preserves.
+    len(m) = e(m, 128)
+           where e(v, d) = [v + d]                           if v < 128
+                           e(v / 128, 0) ++ [(v % 128) + d]  if v ≥ 128

-> Each byte in a varint, except the last byte, has the most
-> significant bit (msb) set – this indicates that there are further
-> bytes to come. The lower 7 bits of each byte are used to store the
-> two's complement representation of the number in groups of 7 bits,
-> least significant group first.
+  [^see-also-leb128]: Argdata's length representation is very close to
+    [Variable-length quantity (VLQ)][VLQ] encoding, differing only in
+    the flipped interpretation of the high bit of each byte. It is
+    big-endian, unlike [LEB128][] encoding ([as used by
+    Google][google-varint] in protobufs).

 The following table illustrates varint-encoding.

-| Number, `m` | `m` in binary, grouped into 7-bit chunks  | `varint(m)` bytes |
-| ------      | -------------------                       | ------------      |
-| 15          | `0001111`                                 | 15                |
-| 300         | `0000010 0101100`                         | 172 2             |
-| 1000000000  | `0000011 1011100 1101011 0010100 0000000` | 128 148 235 220 3 |
+| Number, `m` | `m` in binary, grouped into 7-bit chunks  | `len(m)` bytes  |
+|-------------|-------------------------------------------|-----------------|
+| 15          | `0001111`                                 | 143             |
+| 300         | `0000010 0101100`                         | 2 172           |
+| 1000000000  | `0000011 1011100 1101011 0010100 0000000` | 3 92 107 20 128 |

-It is an error for a varint-encoded `m` in a `Repr` to be anything
-other than the unique shortest encoding for that `m`. That is, a
-varint-encoding of `m` *MUST NOT* end in `0` unless `m`=0.
+It is an error for a varint-encoded `m` in a `Repr` to be anything other
+than the unique shortest encoding for that `m`. That is, a
+varint-encoding of `m` *MUST NOT* start with `0`.

 ### Records, Sequences, Sets and Dictionaries.

-          «<L F_1...F_m>» = [0xB4] ++ «L» ++ «F_1» ++...++ «F_m» ++ [0x84]
-            «[X_1...X_m]» = [0xB5] ++ «X_1» ++...++ «X_m» ++ [0x84]
-           «#{E_1...E_m}» = [0xB6] ++ «E_1» ++...++ «E_m» ++ [0x84]
-    «{K_1:V_1...K_m:V_m}» = [0xB7] ++ «K_1» ++ «V_1» ++...++ «K_m» ++ «V_m» ++ [0x84]
+          «<L F_1...F_m>» = [0xA7] ++ seq(«L», «F_1», ..., «F_m»)
+            «[X_1...X_m]» = [0xA8] ++ seq(«X_1», ..., «X_m»)
+           «#{E_1...E_m}» = [0xA9] ++ seq(«E_1», ..., «E_m»)
+    «{K_1:V_1...K_m:V_m}» = [0xAA] ++ seq(«K_1», «V_1», ..., «K_m», «V_m»)
+                          where seq(R_1, ... R_m) = len(R_1) ++ R_1 ++...++ len(R_m) ++ R_m

 There is *no* ordering requirement on the `E_i` elements or
 `K_i`/`V_i` pairs.[^no-sorting-rationale] They may appear in any
@ -89,7 +94,7 @@ serializing in some other implementation-defined order.
    ordering for writing out set elements and dictionary key/value
    pairs is *not* the same as the sort ordering implied by the
    semantic ordering of those elements or keys. For example, the
-    `Repr` of a negative number very far from zero will start with
+    `Repr` of a negative number very far from zero will start with a
    byte that is *greater* than the byte which starts the `Repr` of
    zero, making it sort lexicographically later by `Repr`, despite
    being semantically *less than* zero.
@ -101,39 +106,31 @@ serializing in some other implementation-defined order.

 ### SignedIntegers.

-    «x» when x ∈ SignedInteger = [0xB0] ++ varint(m) ++ intbytes(x)  if ¬(-3≤x≤12) ∧ m>16
-                                 ([0xA0] + m - 1) ++ intbytes(x)     if ¬(-3≤x≤12) ∧ m≤16
-                                 ([0xA0] + x)                        if  (-3≤x≤-1)
-                                 ([0x90] + x)                        if  ( 0≤x≤12)
-                               where m =        |intbytes(x)|
+    «x» when x ∈ SignedInteger = [0xA3] ++ intbytes(x)

-Integers in the range [-3,12] are compactly represented with tags
-between `0x90` and `0x9F` because they are so frequently used.
-Integers up to 16 bytes long are represented with a single-byte tag
-encoding the length of the integer. Larger integers are represented
-with an explicit varint length. Every `SignedInteger` *MUST* be
-represented with its shortest possible encoding.
+The function `intbytes(x)` gives the big-endian two's-complement binary
+representation of `x`, taking exactly as many whole bytes as needed to
+unambiguously identify the value and its sign. As a special case,
+`intbytes(0)` is the empty byte sequence. The most-significant bit in
+the first byte in `intbytes(x)` (for `x`≠0) is the sign
+bit.[^zero-intbytes] Every `SignedInteger` *MUST* be represented with
+its shortest possible encoding.

-The function `intbytes(x)` gives the big-endian two's-complement
-binary representation of `x`, taking exactly as many whole bytes as
-needed to unambiguously identify the value and its sign, and `m =
-|intbytes(x)|`. The most-significant bit in the first byte in
-`intbytes(x)` <!-- for `x`≠0 --> is the sign bit.[^zero-intbytes] For
-example,
+For example,

      «87112285931760246646623899502532662132736»
-        = B0 12 01 00 00 00 00 00 00 00
-                00 00 00 00 00 00 00 00
-                00 00
+        = A3 01 00 00 00 00 00 00 00
+             00 00 00 00 00 00 00 00
+             00 00

-      «-257» = A1 FE FF        «-3» = 9D          «128» = A1 00 80
-      «-256» = A1 FF 00        «-2» = 9E          «255» = A1 00 FF
-      «-255» = A1 FF 01        «-1» = 9F          «256» = A1 01 00
-      «-254» = A1 FF 02         «0» = 90        «32767» = A1 7F FF
-      «-129» = A1 FF 7F         «1» = 91        «32768» = A2 00 80 00
-      «-128» = A0 80           «12» = 9C        «65535» = A2 00 FF FF
-      «-127» = A0 81           «13» = A0 0D     «65536» = A2 01 00 00
-        «-4» = A0 FC          «127» = A0 7F    «131072» = A2 02 00 00
+      «-257» = A3 FE FF        «-3» = A3 FD       «128» = A3 00 80
+      «-256» = A3 FF 00        «-2» = A3 FE       «255» = A3 00 FF
+      «-255» = A3 FF 01        «-1» = A3 FF       «256» = A3 01 00
+      «-254» = A3 FF 02         «0» = A3        «32767» = A3 7F FF
+      «-129» = A3 FF 7F         «1» = A3 01     «32768» = A3 00 80 00
+      «-128» = A3 80           «12» = A3 0C     «65535» = A3 00 FF FF
+      «-127» = A3 81           «13» = A3 0D     «65536» = A3 01 00 00
+        «-4» = A3 FC          «127» = A3 7F    «131072» = A3 02 00 00

  [^zero-intbytes]: The value 0 needs zero bytes to identify the
    value, so `intbytes(0)` is the empty byte string. Non-zero values
@ -146,19 +143,19 @@ and `Symbol`, the data following the tag is a UTF-8 encoding of the
 `Value`'s code points, while for `ByteString` it is the raw data
 contained within the `Value` unmodified.

-    «S» = [0xB1] ++ varint(|utf8(S)|) ++ utf8(S)  if S ∈ String
-          [0xB2] ++ varint(|S|) ++ S              if S ∈ ByteString
-          [0xB3] ++ varint(|utf8(S)|) ++ utf8(S)  if S ∈ Symbol
+    «S» = [0xA4] ++ utf8(S)  if S ∈ String
+          [0xA5] ++ S        if S ∈ ByteString
+          [0xA6] ++ utf8(S)  if S ∈ Symbol

 ### Booleans.

-    «#f» = [0x80]
-    «#t» = [0x81]
+    «#f» = [0xA0]
+    «#t» = [0xA1]

 ### Floats and Doubles.

-    «F» when F ∈ Float  = [0x82] ++ binary32(F)
-    «D» when D ∈ Double = [0x83] ++ binary64(D)
+    «F» when F ∈ Float  = [0xA2] ++ binary32(F)
+    «D» when D ∈ Double = [0xA2] ++ binary64(D)

 The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
 8-byte IEEE 754 binary representations of `F` and `D`, respectively.
@ -166,20 +163,25 @@ The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
 ### Embeddeds.

 The `Repr` of an `Embedded` is the `Repr` of a `Value` chosen to
-represent the denoted object, prefixed with `[0x86]`.
+represent the denoted object, prefixed with `[0xBF]`.

-    «#!V» = [0x86] ++ «V»
+    «#!V» = [0xBF] ++ «V»

 ### Annotations.

-To annotate a `Repr` `r` with some `Value` `v`, prepend `r` with
-`[0x85] ++ «v»`. For example, the `Repr` corresponding to textual
-syntax `@a@b[]`, i.e. an empty sequence annotated with two symbols,
-`a` and `b`, is
+To annotate a `Repr` `r` with some sequence of `Value`s `[v_1, ...,
+v_m]`, surround `r` as follows:
+
+    [0xBE] ++ len(r) ++ r ++ len(v_1) ++ v_1 ++...++ len(v_m) ++ v_m
+
+The `Repr` `r` *MUST NOT* already have annotations; that is, it must not begin with `0xBE`.
+
+For example, the `Repr` corresponding to textual syntax `@a@b[]`, i.e.
+an empty sequence annotated with two symbols, `a` and `b`, is

    «@a @b []»
-      = [0x85] ++ «a» ++ [0x85] ++ «b» ++ «[]»
-      = [0x85, 0xB3, 0x01, 0x61, 0x85, 0xB3, 0x01, 0x62, 0xB5, 0x84]
+      = [0xBE] ++ len(«[]») ++ «[]» ++ len(«a») ++ «a» ++ len(«b») ++ «b»
+      = [0xBE, 0x81, 0xA8, 0x82, 0xA6, 0x61, 0x82, 0xA6, 0x62]

 ## Security Considerations

@ -194,45 +196,67 @@ implementations *SHOULD* produce canonical binary encodings by
 default; however, an implementation *MAY* permit two serializations of
 the same `Value` to yield different binary `Repr`s.

+## Acknowledgements
+
+The exclusion of lengths from `Repr`s, placing lengths instead ahead of
+contained values in sequences, is inspired by [argdata][].
+
 ## Appendix. Autodetection of textual or binary syntax

-Every tag byte in a binary Preserves `Document` falls within the range
+Every tag byte in a binary Preserves `Repr` falls within the range
 [`0x80`, `0xBF`]. These bytes, interpreted as UTF-8, are *continuation
 bytes*, and will never occur as the first byte of a UTF-8 encoded code
-point. This means no binary-encoded document can be misinterpreted as
+point. This means no binary-encoded `Repr` can be misinterpreted as
 valid UTF-8.

-Conversely, a UTF-8 document must start with a valid codepoint,
+Conversely, a UTF-8 `Document` must start with a valid codepoint,
 meaning in particular that it must not start with a byte in the range
 [`0x80`, `0xBF`]. This means that no UTF-8 encoded textual-syntax
-Preserves document can be misinterpreted as a binary-syntax document.
+Preserves `Document` can be misinterpreted as a binary-syntax `Repr`.

-Examination of the top two bits of the first byte of a document gives
-its syntax: if the top two bits are `10`, it should be interpreted as
-a binary-syntax document; otherwise, it should be interpreted as text.
+Examination of the top two bits of the first byte of an encoded `Value`
+gives its syntax: if the top two bits are `10`, it should be interpreted
+as a binary-syntax `Repr`; otherwise, it should be interpreted as text.
+
+**Streaming.** Autodetection is still possible when streaming an
+undetermined number of `Value`s across, say, a TCP/IP connection:
+
+ - If the text syntax is to be used for the connection, simply start
+   writing each `Document` one after the other. Documents for `Atom`s
+   *MUST* be separated from their neighbours by whitespace; in general,
+   whitespace *SHOULD* be used to separate adjacent documents.
+   Specifically, whitespace separating adjacent documents *SHOULD* be
+   ASCII newline (10).
+
+ - If the binary syntax is to be used for the connection, start the
+   connection with byte `0xA8` (sequence). After the initial byte, send
+   each value `v` as `len(«v») ++ «v»`. A side effect of this approach
+   is that the entire stream, when complete, is a valid `Sequence`
+   `Repr`.

 ## Appendix. Table of tag values

-     80 - False
-     81 - True
-     82 - Float
-     83 - Double
-     84 - End marker
-     85 - Annotation
-     86 - Embedded
-    (8x)  RESERVED 87-8F
+    (8x)  RESERVED 80-8F
+    (9x)  RESERVED 90-9F

-     9x - Small integers 0..12,-3..-1
-     An - Medium integers, (n+1) bytes long
-     B0 - Large integers, variable length
-     B1 - String
-     B2 - ByteString
-     B3 - Symbol
+     A0 - False
+     A1 - True
+     A2 - Float or Double (length disambiguates)
+     A3 - SignedIntegers (0 is encoded with no bytes at all)
+     A4 - String (no trailing NUL is added)
+     A5 - ByteString
+     A6 - Symbol

-     B4 - Record
-     B5 - Sequence
-     B6 - Set
-     B7 - Dictionary
+     A7 - Record
+     A8 - Sequence
+     A9 - Set
+     AA - Dictionary
+
+    (Ax)  RESERVED AB-AF
+
+    (Bx)  RESERVED B0-BD
+     BE - Annotations. {BE Lval val Lann0 ann0 Lann1 ann1 ...}
+     BF - Embedded

 ## Appendix. Binary SignedInteger representation

@ -242,15 +266,15 @@ values.

 | Integer range                              | Bytes required | Encoding (hex)                               |
 | ---                                        | ---            | ---                                          |
-| -3 ≤ n ≤ 12                                | 1              | `9X`                                         |
-| -2<sup>7</sup> ≤ n < 2<sup>7</sup> (i8)    | 2              | `A0` `XX`                                    |
-| -2<sup>15</sup> ≤ n < 2<sup>15</sup> (i16) | 3              | `A1` `XX` `XX`                               |
-| -2<sup>23</sup> ≤ n < 2<sup>23</sup> (i24) | 4              | `A2` `XX` `XX` `XX`                          |
+| 0                                          | 1              | `A3`                                         |
+| -2<sup>7</sup> ≤ n < 2<sup>7</sup> (i8)    | 2              | `A3` `XX`                                    |
+| -2<sup>15</sup> ≤ n < 2<sup>15</sup> (i16) | 3              | `A3` `XX` `XX`                               |
+| -2<sup>23</sup> ≤ n < 2<sup>23</sup> (i24) | 4              | `A3` `XX` `XX` `XX`                          |
 | -2<sup>31</sup> ≤ n < 2<sup>31</sup> (i32) | 5              | `A3` `XX` `XX` `XX` `XX`                     |
-| -2<sup>39</sup> ≤ n < 2<sup>39</sup> (i40) | 6              | `A4` `XX` `XX` `XX` `XX` `XX`                |
-| -2<sup>47</sup> ≤ n < 2<sup>47</sup> (i48) | 7              | `A5` `XX` `XX` `XX` `XX` `XX` `XX`           |
-| -2<sup>55</sup> ≤ n < 2<sup>55</sup> (i56) | 8              | `A6` `XX` `XX` `XX` `XX` `XX` `XX` `XX`      |
-| -2<sup>63</sup> ≤ n < 2<sup>63</sup> (i64) | 9              | `A7` `XX` `XX` `XX` `XX` `XX` `XX` `XX` `XX` |
+| -2<sup>39</sup> ≤ n < 2<sup>39</sup> (i40) | 6              | `A3` `XX` `XX` `XX` `XX` `XX`                |
+| -2<sup>47</sup> ≤ n < 2<sup>47</sup> (i48) | 7              | `A3` `XX` `XX` `XX` `XX` `XX` `XX`           |
+| -2<sup>55</sup> ≤ n < 2<sup>55</sup> (i56) | 8              | `A3` `XX` `XX` `XX` `XX` `XX` `XX` `XX`      |
+| -2<sup>63</sup> ≤ n < 2<sup>63</sup> (i64) | 9              | `A3` `XX` `XX` `XX` `XX` `XX` `XX` `XX` `XX` |

 <!-- Heading to visually offset the footnotes from the main document: -->
 ## Notes
--- a/preserves-text.md
+++ b/preserves-text.md
@ -206,8 +206,8 @@ object, prefixed with `#!`.
           Embedded = "#!" Value

 Finally, any `Value` may be represented by escaping from the textual
-syntax to the [machine-oriented binary syntax](preserves-binary.html)
-by prefixing a `ByteString` containing the binary representation of the
+syntax to the [machine-oriented binary syntax](preserves-binary.html) by
+prefixing a `ByteString` containing the binary representation of the
 `Value` with `#=`.[^rationale-switch-to-binary]
 [^no-literal-binary-in-text] [^machine-value-annotations]

@ -216,18 +216,18 @@ by prefixing a `ByteString` containing the binary representation of the
  [^rationale-switch-to-binary]: **Rationale.** The textual syntax
    cannot express every `Value`: specifically, it cannot express the
    several million floating-point NaNs, or the two floating-point
-    Infinities. Since the machine-oriented binary format for `Value`s
-    expresses each `Value` with precision, embedding binary `Value`s
-    solves the problem.
+    Infinities. Since the machine-oriented binary format for `Value`s expresses
+    each `Value` with precision, embedding binary `Value`s solves the
+    problem.

  [^no-literal-binary-in-text]: Every text is ultimately physically
-    stored as bytes; therefore, it might seem possible to escape to the
-    raw form of binary encoding from within a piece of textual syntax.
-    However, while bytes must be involved in any *representation* of
-    text, the text *itself* is logically a sequence of *code points* and
-    is not *intrinsically* a binary structure at all. It would be
-    incoherent to expect to be able to access the representation of the
-    text from within the text itself.
+    stored as bytes; therefore, it might seem possible to escape to
+    the raw binary encoding from within a
+    piece of textual syntax. However, while bytes must be involved in
+    any *representation* of text, the text *itself* is logically a
+    sequence of *code points* and is not *intrinsically* a binary
+    structure at all. It would be incoherent to expect to be able to
+    access the representation of the text from within the text itself.

  [^machine-value-annotations]: Any text-syntax annotations preceding
    the `#` are prepended to any binary-syntax annotations yielded by
@ -235,11 +235,11 @@ by prefixing a `ByteString` containing the binary representation of the

 ## Annotations

-When written down, a `Value` may have an associated sequence of
-*annotations* carrying “out-of-band” contextual metadata about the
-value. Each annotation is, in turn, a `Value`, and may itself have
-annotations. The ordering of annotations attached to a `Value` is
-significant.
+When written down, a `Value` may have an associated
+sequence of *annotations* carrying “out-of-band” contextual metadata
+about the value. Each annotation is, in turn, a `Value`, and may
+itself have annotations. The ordering of annotations attached to a
+`Value` is significant.

            Value =/ ws "@" Value Value

@ -276,7 +276,7 @@ different.

 ## Security Considerations

-**Whitespace.** The textual format allows arbitrary whitespace in many
+**Whitespace.** The text syntax allows arbitrary whitespace in many
 positions. Consider optional restrictions on the amount of consecutive
 whitespace that may appear.

--- a/preserves.md
+++ b/preserves.md
@ -220,21 +220,21 @@ The total ordering specified [above](#total-order) means that the following stat
 <!-- TODO: Give some examples of large and small Preserves, perhaps -->
 <!-- translated from various JSON blobs floating around the internet. -->

-| Value                       | Encoded byte sequence                                                           |
-|-----------------------------|---------------------------------------------------------------------------------|
-| `<capture <discard>>`       | B4 B3 07 'c' 'a' 'p' 't' 'u' 'r' 'e' B4 B3 07 'd' 'i' 's' 'c' 'a' 'r' 'd' 84 84 |
-| `[1 2 3 4]`                 | B5 91 92 93 94 84                                                               |
-| `[-2 -1 0 1]`               | B5 9E 9F 90 91 84                                                               |
-| `"hello"` (format B)        | B1 05 'h' 'e' 'l' 'l' 'o'                                                       |
-| `["a" b #"c" [] #{} #t #f]` | B5 B1 01 'a' B3 01 'b' B2 01 'c' B5 84 B6 84 81 80 84                           |
-| `-257`                      | A1 FE FF                                                                        |
-| `-1`                        | 9F                                                                              |
-| `0`                         | 90                                                                              |
-| `1`                         | 91                                                                              |
-| `255`                       | A1 00 FF                                                                        |
-| `1.0f`                      | 82 3F 80 00 00                                                                  |
-| `1.0`                       | 83 3F F0 00 00 00 00 00 00                                                      |
-| `-1.202e300`                | 83 FE 3C B7 B7 59 BF 04 26                                                      |
+| Value                       | Encoded byte sequence                                                        |
+|-----------------------------|------------------------------------------------------------------------------|
+| `<capture <discard>>`       | A7 88 A6 'c' 'a' 'p' 't' 'u' 'r' 'e' 8A A7 88 A6 'd' 'i' 's' 'c' 'a' 'r' 'd' |
+| `[1 2 3 4]`                 | A8 82 A3 01 82 A3 02 82 A3 03 82 A3 04                                       |
+| `[-2 -1 0 1]`               | A8 82 A3 FE 82 A3 FF 81 A3 82 A3 01                                          |
+| `"hello"`                   | A4 'h' 'e' 'l' 'l' 'o'                                                       |
+| `["a" b #"c" [] #{} #t #f]` | A8 82 A4 'a' 82 A6 'b' 82 A5 'c' 81 A8 81 A9 81 A1 81 A0                     |
+| `-257`                      | A3 FE FF                                                                     |
+| `-1`                        | A3 FF                                                                        |
+| `0`                         | A3                                                                           |
+| `1`                         | A3 01                                                                        |
+| `255`                       | A3 00 FF                                                                     |
+| `1.0f`                      | A2 3F 80 00 00                                                               |
+| `1.0`                       | A2 3F F0 00 00 00 00 00 00                                                   |
+| `-1.202e300`                | A2 FE 3C B7 B7 59 BF 04 26                                                   |

 The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`

@ -242,24 +242,21 @@ The next example uses a non-`Symbol` label for a record.[^extensibility2] The `R

 encodes to

-    B4                                ;; Record
-      B5                                ;; Sequence
-        B3 06 74 69 74 6C 65 64           ;; Symbol, "titled"
-        B3 06 70 65 72 73 6F 6E           ;; Symbol, "person"
-        92                                ;; SignedInteger, "2"
-        B3 05 74 68 69 6E 67              ;; Symbol, "thing"
-        91                                ;; SignedInteger, "1"
-      84                                ;; End (sequence)
-      A0 65                             ;; SignedInteger, "101"
-      B1 09 42 6C 61 63 6B 77 65 6C 6C  ;; String, "Blackwell"
-      B4                                ;; Record
-        B3 04 64 61 74 65                 ;; Symbol, "date"
-        A1 07 1D                          ;; SignedInteger, "1821"
-        92                                ;; SignedInteger, "2"
-        93                                ;; SignedInteger, "3"
-      84                                ;; End (record)
-      B1 02 44 72                       ;; String, "Dr"
-    84                                ;; End (record)
+    A7                                ;; Record
+      9E A8                             ;; Length 30, Sequence
+        87 A6 74 69 74 6C 65 64           ;; Length 7, Symbol, "titled"
+        87 A6 70 65 72 73 6F 6E           ;; Length 7, Symbol, "person"
+        82 A3 02                          ;; Length 2, SignedInteger, "2"
+        86 A6 74 68 69 6E 67              ;; Length 6, Symbol, "thing"
+        82 A3 01                          ;; Length 2, SignedInteger, "1"
+      82 A3 65                          ;; Length 2, SignedInteger, "101"
+      8A A4 42 6C 61 63 6B 77 65 6C 6C  ;; Length 10, String, "Blackwell"
+      91 A7                             ;; Length 17, Record
+        85 A6 64 61 74 65                 ;; Length 5, Symbol, "date"
+        83 A3 07 1D                       ;; Length 3, SignedInteger, "1821"
+        82 A3 02                          ;; Length 2, SignedInteger, "2"
+        82 A3 03                          ;; Length 2, SignedInteger, "3"
+      83 A4 44 72                       ;; Length 3, String, "Dr"

  [^extensibility2]: It happens to line up with Racket's
    representation of a record label for an inheritance hierarchy
@ -311,27 +308,23 @@ The first RFC 8259 example:
 when read using the Preserves text syntax encodes via the binary syntax
 as follows:

-    B7
-      B1 05 "Image"
-      B7
-        B1 03 "IDs"      B5
-                           A0 74
-                           A1 03 AF
-                           A1 00 EA
-                           A2 00 97 89
-                         84
-        B1 05 "Title"    B1 14 "View from 15th Floor"
-        B1 05 "Width"    A1 03 20
-        B1 06 "Height"   A1 02 58
-        B1 08 "Animated" B3 05 "false"
-        B1 09 "Thumbnail"
-          B7
-            B1 03 "Url"    B1 26 "http://www.example.com/image/481989943"
-            B1 05 "Width"  A0 64
-            B1 06 "Height" A0 7D
-          84
-      84
-    84
+    AA
+      86 A4 "Image"
+      01 AC AA
+        89 A4 "Animated" 86 A6 "false"
+        87 A4 "Height"   83 A3 02 58
+        84 A4 "IDs"      91 A8
+                           82 A3 74
+                           83 A3 03 AF
+                           83 A3 00 EA
+                           84 A3 00 97 89
+        8A A4 "Thumbnail"
+          C3 AA
+            87 A4 "Height" 82 A3 7D
+            84 A4 "Url"    A7 A4 "http://www.example.com/image/481989943"
+            86 A4 "Width"  82 A3 64
+        86 A4 "Title"    95 A4 "View from 15th Floor"
+        86 A4 "Width"    83 A3 03 20

 The second RFC 8259 example:

@ -360,28 +353,25 @@ The second RFC 8259 example:

 encodes to binary as follows:

-    B5
-      B7
-        B1 03 "Zip"        B1 05 "94107"
-        B1 04 "City"       B1 0D "SAN FRANCISCO"
-        B1 05 "State"      B1 02 "CA"
-        B1 07 "Address"    B1 00
-        B1 07 "Country"    B1 02 "US"
-        B1 08 "Latitude"   83 40 42 E2 26 80 9D 49 52
-        B1 09 "Longitude"  83 C0 5E 99 56 6C F4 1F 21
-        B1 09 "precision"  B1 03 "zip"
-      84
-      B7
-        B1 03 "Zip"        B1 05 "94085"
-        B1 04 "City"       B1 09 "SUNNYVALE"
-        B1 05 "State"      B1 02 "CA"
-        B1 07 "Address"    B1 00
-        B1 07 "Country"    B1 02 "US"
-        B1 08 "Latitude"   83 40 42 AF 9D 66 AD B4 03
-        B1 09 "Longitude"  83 C0 5E 81 AA 4F CA 42 AF
-        B1 09 "precision"  B1 03 "zip"
-      84
-    84
+    A8
+      FE AA
+        88 A4 "Address"    81 A4
+        85 A4 "City"       8E A4 "SAN FRANCISCO"
+        88 A4 "Country"    83 A4 "US"
+        89 A4 "Latitude"   89 A2 40 42 E2 26 80 9D 49 52
+        8A A4 "Longitude"  89 A2 C0 5E 99 56 6C F4 1F 21
+        86 A4 "State"      83 A4 "CA"
+        84 A4 "Zip"        86 A4 "94107"
+        8A A4 "precision"  84 A4 "zip"
+      FA AA
+        88 A4 "Address"    81 A4
+        85 A4 "City"       8A A4 "SUNNYVALE"
+        88 A4 "Country"    83 A4 "US"
+        89 A4 "Latitude"   89 A2 40 42 AF 9D 66 AD B4 03
+        8A A4 "Longitude"  89 A2 C0 5E 81 AA 4F CA 42 AF
+        86 A4 "State"      83 A4 "CA"
+        84 A4 "Zip"        86 A4 "94085"
+        8A A4 "precision"  84 A4 "zip"

 <!-- Heading to visually offset the footnotes from the main document: -->
 ## Notes
--- a/representations.md
+++ b/representations.md
@ -2,6 +2,8 @@
 title: "Representing Values in Programming Languages"
 ---

+  [erlang-map]: http://erlang.org/doc/reference_manual/data_types.html#map
+
 **NOT YET READY**

 We have given a definition of `Value` and its semantics, and proposed