codec.md draft

2018-06-05 10:32:00 +01:00 · 2018-06-05 10:32:00 +01:00 · 738fad360a
parent c23285781c
commit 738fad360a
1 changed files with 662 additions and 0 deletions
--- a/syndicate/codec.md
+++ b/syndicate/codec.md
@ -0,0 +1,662 @@
+---
+---
+<style>
+body { font-size: 120%; margin-left: 2rem; }
+h1, h2, h3, h4, h5, h6 { margin-left: -1rem; }
+h2 { border-bottom: solid black 1px; }
+</style>
+
+2018-06-04 20:31:48 TODO: cwebber's email comments
+2018-06-04 20:31:51 TODO: look at https://github.com/imbal/rson and at clojure EDN
+2018-06-05 10:32:02 ... and at http://json-schema.org/latest/json-schema-core.html#rfc.section.4.2
+
+# SPKI CAT: SPKI S-Expressions with Canonical Atom Tags
+
+Tony Garnock-Jones <tonyg@leastfixedpoint.com>  
+Christopher Lemmer Webber <cwebber@dustycloud.org>  
+May 2018  
+Version 0.0.1
+
+                     ________________
+                    /                \
+       /\__/\      /  boo!            \
+      /      \     \   i'm very spki!  \
+    \=\_^__^_/= ___/\__________________/
+     |/      \
+     \\ |  | /
+      <_|--|_>
+
+## Introduction
+
+This document proposes a language-neutral JSON-like *data type*, along
+with a robust equivalence relation ("semantics") and a total ordering
+over inhabitants of the type.[^tjson]
+
+  [^tjson]: [TJSON](https://www.tjson.org/) has a similar aim:
+    “different on-the-wire representations of an object correspond to
+    the same typed data object”
+    ([source](https://news.ycombinator.com/item?id=12860143)); “TJSON
+    is defined as a serialization format on top of a JSON-like data
+    model” ([source](https://news.ycombinator.com/item?id=12860401)).
+
+It then suggests conventions for encoding common data formats in terms
+of the proposed data type.
+
+Finally, it proposes concrete *syntax* for the data type, offering a
+language-neutral transfer syntax (based on
+[Rivest's S-Expressions][sexp.txt] as used in [SPKI/SDSI][spki]) and
+suggesting possible language-specific representations for the data
+type's inhabitants.
+
+  [sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
+  [spki]: http://world.std.com/~cme/html/spki.html
+
+### Why not Just Use JSON?
+
+<!-- JSON lacks semantics: JSON syntax doesn't denote anything -->
+
+JSON offers *syntax* for numbers, strings, booleans, null, arrays and
+string-keyed maps. However, it offers no *semantics* for the syntax:
+it is left to each implementation to determine how to treat each JSON
+term. This causes
+[interoperability](http://seriot.ch/parsing_json.php) and even
+[security](http://seriot.ch/parsing_json.php) issues.
+
+Specifically, JSON does not:
+
+ - assign any meaning to numbers,[^meaning-ieee-double]
+ - determine how strings are to be compared,[^string-key-comparison]
+ - determine whether object key ordering is significant, or
+ - determine whether duplicate object keys are permitted, what it
+   would mean if they were, or how to determine a duplicate in the
+   first place.
+
+In short, JSON syntax doesn't *denote* anything.[^xml-infoset] [^other-formats]
+
+  [^meaning-ieee-double]:
+    [Section 6 of RFC 7159](https://tools.ietf.org/html/rfc7159#section-6)
+    does go so far as to indicate “good interoperability can be
+    achieved” by imagining that parsers are able reliably to
+    understand the syntax of numbers as denoting an IEEE 754
+    double-precision floating-point value.
+
+  [^string-key-comparison]:
+    [Section 8.3 of RFC 7159](https://tools.ietf.org/html/rfc7159#section-8.3)
+    suggests that *if* an implementation compares strings used as
+    object keys “code unit by code unit”, then it will interoperate
+    with *other such implementations*, but neither requires this
+    behaviour nor discusses comparisons of strings used in other
+    contexts.
+
+  [^xml-infoset]: The XML world has the concept of
+    [XML infoset](https://www.w3.org/TR/xml-infoset/). Loosely
+    speaking, XML infoset is the *denotation* of an XML document; the
+    *meaning* of the document.
+
+  [^other-formats]: Most other recent data languages are like JSON in
+    specifying only a syntax with no associated semantics. While some
+    do make a sketch of a semantics, the result is often
+    underspecified (e.g. in terms of how strings are to be compared),
+    overly machine-oriented (e.g. treating 32-bit integers as
+    fundamentally distinct from 64-bit integers and from
+    floating-point numbers), overly fine (e.g. giving visibility to
+    the order in which map entries are written), or all three.
+
+Some examples:
+
+ - are the JSON values `1`, `1.0`, and `1e0` the same or different?
+ - are the JSON values `1.0` and `1.0000000000000001` the same or different?
+ - are the JSON strings `"päron"` (UTF-8 `70c3a4726f6e`) and `"päron"`
+   (UTF-8 `7061cc88726f6e`) the same or different?
+ - are the JSON objects `{"a":1, "b":2}` and `{"b":2, "a":1}` the same
+   or different?
+ - which, if any, of `{"a":1, "a":2}`, `{"a":1}` and `{"a":2}` are the
+   same? Are all three legal?
+ - are `{"päron":1}` and `{"päron":1}` the same or different?
+
+Different JSON implementations give different answers to these
+questions. The JSON specifications are silent on these questions.
+
+There are other minor problems with JSON having to do with its syntax.
+Examples include its relative verbosity and its lack of support for
+binary data.
+
+## Starting with Semantics
+
+Taking inspiration from functional programming, we start with a
+definition of the *values* that we want to work with and give them
+meaning independent of their syntax. We will treat syntax separately,
+later in this document.
+
+We will want our data type to accommodate *atoms* (numbers and text),
+*products* (both tuples and sequences), and *labelled
+sums*.[^zephyr-asdl] It should also include *keyed maps*. We should
+avoid unnecessary restrictions such as machine-oriented fixed-width
+integer or floating-point values where possible.
+
+  [^zephyr-asdl]: This design was loosely inspired by Zephyr ASDL (h/t
+    [Darius Bacon](https://twitter.com/abecedarius/status/993545767884226561)),
+    which doesn't offer much in the way of atoms, but offers
+    general-purpose labelled sums and products. See D. C. Wang, A. W.
+    Appel, J. L. Korn, and C. S. Serra, “The Zephyr Abstract Syntax
+    Description Language,” in USENIX Conference on Domain-Specific
+    Languages, 1997, pp. 213–228.
+    [PDF available.](https://www.usenix.org/legacy/publications/library/proceedings/dsl97/full_papers/wang/wang.pdf)
+
+### Values
+
+A `Value` is one of:
+
+ - a `ByteString` for general-purpose non-numeric atomic data,[^byte-string-rationale]
+ - a `Number` for integers and rational numbers,
+ - a `List` for general-purpose variable-length sequences,
+ - a `Map` for general-purpose variable-size key/value maps, or
+ - a `Record` for tagging a tuple of values with an intended interpretation.
+
+We define a total order over `Value`s: Every `ByteString` is less than
+the other kinds of `Value`; every `Number` is less than any `List`,
+`Map` or `Record`, but greater than any `ByteString`; and so on.
+
+That is, `ByteString < Number < List < Map < Record`.
+
+Two values of the same kind are compared using kind-specific rules,
+given below.
+
+Two `Value`s are equal if neither is less than the other according to
+the total order.
+
+  [^byte-string-rationale]: Why include `ByteString`, when we could
+    instead use a reserved `Record` along with a `List` of `Number`s?
+    ((TODO: Actually decide about this! Similarly, why include `Map`
+    rather than a restricted form of `List` with a `Record`? I think
+    the answer has to do with the arbitrariness of the label we'd
+    pick: unless *extremely* carefully chosen (i.e. number 0 (ideally
+    even `-Inf`!) for byte strings, number 1 for map, and have the
+    order go `Number < Record < List`), they would mess up the
+    prettiness of the ordering. Though... we could ultimately reduce
+    this to `Number` and `Record`, and have a family of `#"list"` and
+    `#"map"` `Record`s...))
+
+### Byte strings
+
+A `ByteString` is an ordered sequence of zero or more integers in the
+inclusive range [0..255].
+
+`ByteString`s are compared lexicographically.
+
+We will write examples of `ByteString`s that contain only ASCII
+characters using “`#"`” as an opening quote mark and “`"`” as a
+closing quote mark.
+
+**Examples.** The `ByteString` containing the three ASCII characters
+`A`, `B` and `C` is written as `#"ABC"`. The empty `ByteString` is
+written as `#""`. **N.B.** Despite appearances, these are *binary*
+data.
+
+### Numbers
+
+A `Number` is a signed rational number of finite precision whose
+magnitude can be exactly represented in base two with a finite number
+of digits. This includes integers of arbitrary width as well as (for
+example) the non-infinite non-NaN IEEE 754 floating-point values.
+
+`Number`s are compared as mathematical numbers.
+
+We will write examples of `Number`s using standard mathematical
+notation.
+
+**Examples.** 10, -6, 0.5, -3/2, 33/192, -1.202E4567.
+
+**Non-examples.** NaN (the clue is in the name!), ∞ (not finite),
+0.2 (cannot be exactly represented with a finite number of binary
+digits), 1/7 (likewise), 2+*i*3 (not rational), √2 (likewise).
+
+### Lists
+
+A `List` is an ordered sequence of zero or more `Value`s.
+
+`List`s are compared lexicographically, appealing to the ordering on
+`Value`s for comparisons at each position in the `List`s.
+
+### Maps
+
+A `Map` is an *unordered* collection of zero or more pairs of
+`Value`s. Each pair comprises a *key* and a *value*. Keys in a `Map`
+must be pairwise distinct.
+
+Instances of `Map` are compared by lexicographic comparison of the
+sequences resulting from ordering each `Map`'s pairs in ascending
+order by key. ((TODO: Is this a good idea? Is it clearly-enough
+written? An alternative approach is to compare first by the *count* of
+pairs, and only if the count is the same, start comparing the pairs
+themselves.))
+
+### Records
+
+A `Record` is a tuple of one or more `Value`s. The first in the tuple
+is called the *label* of the `Record`, and the other elements of the
+tuple are called its *fields*.
+
+`Record` labels are *usually* `ByteString`s, but can be any kind of
+`Value`.[^iri-labels]
+
+  [^iri-labels]: It is occasionally (but seldom) necessary to
+    interpret such `ByteString` labels as UTF-8 encoded IRIs. Where a
+    label can be read as a relative IRI, it is notionally interpreted
+    with respect to the IRI `http://spki-cat.org/` ((TODO:
+    placeholder)); where a label can be read as an absolute IRI, it
+    stands for that IRI; and otherwise, it cannot be read as an IRI at
+    all, and so the label simply stands for itself - for its own
+    `Value`.
+
+`Record`s are compared lexicographically as if they were just tuples;
+that is, first by their labels, and then by the remainder of their
+fields.
+
+We will write examples of `Record`s with `ByteString` labels entirely
+composed of ASCII characters as their label followed by their
+parenthesised, comma-separated fields.
+
+**Examples.** The `Record` with label `#"foo"` and fields 1, 2 and 3
+is written `#"foo"(1, 2, 3)`; the `Record` with label `#"void"` and no
+fields is written `#"void"()`.
+
+## Conventions for Common Data Types
+
+The `Value` data type is essentially an abstract S-Expression, able to
+represent semi-structured data over `ByteString` and `Number` atoms.
+
+However, users need a wide variety of data types for representing
+domain-specific values such as text, calendrical values, machine
+words, IEEE 754 floating-point values, booleans, and so on.
+
+We use appropriately-labelled `Record`s to denote these
+domain-specific data types.
+
+All of these conventions are optional. They form a layer atop the core
+`Value` structure. Non-domain-specific tools do not in general need to
+treat them specially.
+
+**Validity.** Many of the labels we will describe in this section come
+  with side-conditions on the contents of labelled `Record`s. It is
+  possible to construct an instance of `Value` that violates these
+  side-conditions without ceasing to be a `Value` or becoming
+  unrepresentable. However, we say that such a `Value` is *invalid*
+  because it fails to honour the necessary side-conditions.
+  Implementations *SHOULD* allow two modes of working: one which
+  treats all `Value`s identically, without regard for side-conditions,
+  and one which enforces validity (i.e. side-conditions) when reading,
+  writing, or constructing `Value`s.
+
+### Text
+
+A `Text` is a `Record` labelled with the `ByteString` `#"utf-8"` and
+having a single field that is also a `ByteString`. The field *MUST* be
+valid UTF-8.
+
+We will write examples of `Text`s that contain Unicode text using
+“`"`” as both an opening and closing quote mark.
+
+**Examples.** The `Text` containing the three Unicode code points `z`
+(0x7A), `水` (0x6C34) and `𝄞` (0x1D11E) is written as `"z水𝄞"`.
+
+**Normalization forms.** Unicode defines multiple
+[normalization forms](http://unicode.org/reports/tr15/) for text. The
+ordering and equivalence relations defined for `Value`s mean that, for
+Unicode text, the UTF-8 encoded byte-level form of a text is used in
+comparisons.[^utf8-is-awesome] In order for users to unambiguously
+signal or require a particular normalization form, we define a
+`NormalizedText`, which is a `Record` labelled with
+`#"unicode-normalization"` and having two fields, the first of which
+is a `Text` specifying the normalization form used (e.g. `"nfc"`,
+`"nfd"`, `"nfkc"`, `"nfkd"`), and the second of which is a `Text`
+whose underlying representation *MUST* be normalized according to the
+named normalization form.
+
+  [^utf8-is-awesome]: Happily, the design of UTF-8 is such that this
+    gives the same result as a lexicographic code-point-by-code-point
+    comparison!
+
+**IRIs.** (URIs, URLs, URNs, etc.) An `IRI` is a `Record` labelled
+with `#"iri"` and having one field, a `Text` which is the IRI itself
+and which *MUST* be a valid absolute or relative IRI.
+
+**Symbols.** Programming languages like Lisp and Prolog frequently use
+string-like values called *symbols*. A `Symbol` is a `Record` labelled
+with `#"symbol"` and having one field, a `Text`.
+
+### Numbers
+
+The definition of `Number` captures all integers and all
+finitely-representable floating-point values. However, in certain
+circumstances it can be valuable to assert that a number inhabits a
+particular range, such as a fixed-width machine word or an IEEE 754
+floating-point value.
+
+**Fixed-width machine words.** (16-, 32- and 64-bit) A family of
+labels `i`*n* and `u`*n* for *n* ∈ {16,32,64} denote *n*-bit-wide
+signed and unsigned range restrictions, respectively. Records with
+these labels *MUST* have one field, a `Number`, which *MUST* fall
+within the appropriate range. That is, to be valid,
+ - in `#"i16"(`*x*`)`, -32768 <= *x* <= 32767, and ⌊*x*⌋ = *x*.
+ - in `#"u16"(`*x*`)`, 0 <= *x* <= 65535, and ⌊*x*⌋ = *x*.
+ - in `#"i32"(`*x*`)`, -2147483648 <= *x* <= 2147483647, and ⌊*x*⌋ = *x*.
+ - etc.
+
+**IEEE 754 floating-point.** (single- and double-precision) The labels
+`f32` and `f64` denote single- and double-precision IEEE 754
+floating-point values, respectively. Records with these labels *MUST*
+have one field. This field *MUST* either be a `Number`, which *MUST*
+fall within the appropriate representable range, or one of the records
+`#"nan"()`, `#"+inf"()` or `#"-inf"()`.
+
+### Anonymous Tuples and Unit
+
+A `Tuple` is a `Record` with label `#"tuple"` and zero or more fields,
+denoting an anonymous tuple of values.
+
+The 0-ary tuple, `#"tuple"()`, denotes the empty tuple, sometimes
+called "unit" or "void" (but *not* e.g. JavaScript's "undefined"
+value).
+
+### Booleans, Null and Undefined
+
+The two 0-ary `Record`s `#"true"()` and `#"false"()` denote the "true"
+and "false" Boolean values, respectively.
+
+Tony Hoare's
+"[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)"
+can be represented with the 0-ary `Record` `#"null"()`. An "undefined"
+value can be represented as `#"undefined"()`.
+
+### Dates and Times
+
+Dates, times, moments, and timestamps can be represented with a
+`Record` with label `#"rfc3339"` having a single field, a `Text`,
+which *MUST* conform to one of the `full-date`, `partial-time`,
+`full-time`, or `date-time` productions of
+[section 5.6 of RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6).
+
+## Syntax
+
+Now we have discussed `Value`s and their meanings, we may turn to
+techniques for *representing* `Value`s for communication or storage.
+
+The syntax we have used for the examples so far is inadequate in many
+ways, not least of which is that it cannot represent every `Value`.
+
+Separation of the meaning of a piece of syntax from the syntax itself
+opens the door to domain-specific syntaxes, all equivalent and
+interconvertible.[^asn1] With a robust semantic foundation,
+connections to other data languages can also be made.
+
+  [^asn1]: Those who remember
+    [ASN.1](https://www.itu.int/en/ITU-T/asn1/Pages/introduction.aspx)
+    will recall BER, DER, PER, CER, XER and so on, each appropriate to
+    a different setting. Similarly,
+    [Rivest's S-Expression design][sexp.txt] offers a human-friendly
+    syntax, a syntax robust to network-induced message corruption, and
+    an unambiguous, simple and easily-parsed machine-friendly syntax
+    for the same underlying values.
+
+### Transfer syntax: S-Expressions
+
+For now, we limit our attention to an easily-parsed, easily-produced
+machine-readable syntax by mapping our `Value`s to the canonical form
+of [Rivest's S-Expressions][sexp.txt].[^why-not-spki-sexps]
+
+  [^why-not-spki-sexps]: Why not just use Rivest's S-Expressions as
+    they are? While they include binary data and sequences, and an
+    obvious equivalence for them exists, they lack numbers *per se* as
+    well as any kind of unordered structure such as sets or maps. In
+    addition, while "display hints" allow labelling of binary data
+    with an intended interpretation, they cannot be attached to any
+    other kind of structure, and the "hint" itself can only be a
+    binary blob.
+
+#### Byte strings
+
+`ByteString`s map to byte-string S-Expressions.
+
+**Examples.**
+ - What we have been writing above as `#"ABC"` would be represented as
+   the S-Expression `3:ABC`.
+ - The empty `ByteString` is represented by the S-Expression `0:`.
+
+#### Numbers
+
+Numbers are the most complicated values to represent as an
+S-Expression.
+
+((TODO: Consider cutting complexity by e.g. representing a `Number` as
+a sign bit, a little-endian blob of the integer part of the number,
+and a little-endian blob of the fractional part of the number. Lots of
+trailing/leading zeros for very large/small numbers!))
+
+We represent `Number`s using a sign-magnitude format, where the
+magnitude is written using a little-endian, twos-complement binary
+[*significand*](https://en.wikipedia.org/wiki/Significand) and a
+(signed) *shift amount*.
+
+In essence, we use a generalized, variable-width form of binary IEEE
+floating-point representation.
+
+Let `N` be the `Number` to represent as an S-Expression.
+
+The sign bit is 0 when `N` is zero or positive, and 1 when `N` is
+negative.
+
+The magnitude of `N` can be viewed as an infinite sequence of bits
+with a fraction-separator mark placed somewhere in the sequence,
+
+```
+···00.000 b_0 b_1 ··· b_{k-1}   b_k ··· b_{n-1} 000000···
+···000000 b_0 b_1 ··· b_{k-1} . b_k ··· b_{n-1} 000000···
+···000000 b_0 b_1 ··· b_{k-1}   b_k ··· b_{n-1} 000.00···
+```
+
+where `b_0` is the leftmost (most significant) and `b_{n-1}` the
+rightmost (least significant) non-zero bit.
+
+Let `k`, the position of the fraction-separator mark, be `i` when it
+is immediately to the left of `b_i` for some `i`, generalizing to
+negative values when it is to the left of `b_0` and values greater
+than `n-1` when it is to the right of `b_{n-1}`.
+
+For example, `k` will be:
+ - 0 when the fraction-separator is immediately (i.e. zero bits) to the left of `b_0`;
+ - -3 (as in the first example above) when it is three bits left of `b_0`;
+ - `n` when it is immediately (i.e. zero bits) to the right of `b_{n-1}`;
+ - `n`+3 when it is three bits to the right of `b_{n-1}`.
+
+The unpadded significand is `b_0 b_1 ··· b_{n-1}`.
+
+When `k` < `n`, the shift `z`=`k-n` and the significand is:
+ - the unpadded significand,
+ - with the sign bit appended to it on the right, and then
+ - padded on the left with zeroes until it is a whole number of octets wide.
+
+When `k` ≥ `n`, the shift `z`=`8×⌊(k-n)/8⌋` and the significand is:
+ - the unpadded significand,
+ - padded on the right with `(k-n) mod 8` zeroes,
+ - with the sign bit then appended on the right, and then
+ - padded on the left with zeroes until it is a whole number of octets wide.
+
+Now, let `s`=2`z` if `z` is zero or positive, or `s`=2|`z`|+1 if `z`
+is negative.
+
+Finally, the S-Expression form of `N` is:
+ - `(4:*num [SIGNIFICAND] [SHIFT])`, if `s`≠0; or
+ - `(4:*num [SIGNIFICAND])`, if `s`=0 but the significand contains non-zero bits; or
+ - `(4:*num)`, if `s`=0 and the significand contains no non-zero bits;
+
+where
+ - `[SIGNIFICAND]` stands for a byte-string S-Expression containing a little-endian representation of the significand, and
+ - `[SHIFT]` stands for a byte-string S-Expression containing a little-endian representation of `s`.
+
+**Examples.** (Shown using the hexadecimal representation of
+byte-strings from
+[section 4.4 of Rivest's S-Expression specification][sexp.txt] in
+places.)
+ - `N`=0 → `(4:*num)`
+ - `N`=1 → `(4:*num#02#)`
+ - `N`=-1 → `(4:*num#03#)`
+ - `N`=10₁₀=1010.0₂ → `n`=3, `k`=4, `z`=0, `s`=0 → `(4:*num#14#)`
+ - `N`=2560₁₀=101000000000.0₂ → `n`=3, `k`=12, `z`=8, `s`=16 → `(4:*num#14##10#)`
+ - `N`=-2560₁₀=-101000000000.0₂ → `n`=3, `k`=12, `z`=8, `s`=16 → `(4:*num#15##10#)`
+ - `N`=-6₁₀=-110.0₂ → `n`=2, `k`=3, `z`=0, `s`=0 → `(4:*num#0D#)`
+ - `N`=0.5₁₀=0.1₂ → `n`=1, `k`=0, `z`=-1, `s`=3 → `(4:*num#02##03#)`
+ - `N`=-3/2₁₀=-1.1₂ → `n`=2, `k`=1, `z`=-1, `s`=3 → `(4:*num#07##03#)`
+ - `N`=33/192₁₀=0.001011₂ → `n`=4, `k`=-2, `z`=-6, `s`=7 → `(4:*num#16##07#)`
+ - `N`=-1.202E4567=1011011001···000₂ (15172 binary digits, the last 4565 of which are zero) → `n`=10607, `k`=15172, `z`=4560, `s`=9120 → `(4:*num#41828E···24CD16##A023#)`
+
+((TODO: figure out what this algorithm would actually look like in,
+say, C, Python and Racket.))
+
+#### Lists
+
+A `List` maps to an S-Expression list of representations of its
+elements, with the byte-string S-Expression `5:*list` prepended.
+
+**Examples.**
+ - The `List` containing the `ByteString`s `#"a"`, `#"b"`, and `#"c"`
+   would be represented as the S-Expression `(5:*list1:a1:b1:c)`.
+ - The empty `List` is represented by the S-Expression `(5:*list)`.
+
+#### Maps
+
+A `Map` is represented by an S-Expression list of representations of
+the `Map`'s key-value pairs, with the byte-string `4:*map` prepended.
+
+Each key-value pair is represented by a two-element S-Expression list
+containing representations of the key and the value, in that order.
+
+The key-value pairs *MUST* be ordered by `Value`-order of their keys.
+
+**Examples.**
+ - The `Map` containing entries mapping `#"a"` to `#"d"` and `#"c"` to
+   `#"b"` is represented by `(4:*map(1:a1:d)(1:c1:b))`.
+ - The `Map` containing an entry mapping the empty list to a "true"
+   Boolean value is represented by `(4:*map((5:*list)(4:true)))`.
+ - The empty `Map` is represented by `(4:*map)`.
+
+**Non-examples.**
+ - The S-Expression `(4:*map(1:c1:b)(1:a1:d))` is invalid, because its
+   key-value pairs are not in `Value`-order by key: `#"c"` > `#"a"`.
+ - The S-Expression `(4:*map1:a1:d1:c1:b)` is invalid, because its
+   key-value pairs appear "flattened" in the outer list, rather than
+   each appearing in a two-element list of its own.
+
+#### Records
+
+A `Record` is represented by an S-Expression list of its fields,
+prepended by:
+
+ - the representation of its label, if its label is a `ByteString` and
+   does not begin with byte 42 (ASCII "`*`"); or
+ - the S-Expression `1:*` followed by the representation of the
+   `Record`'s label, otherwise.
+
+**Examples.**
+ - The `Text` `"hello-world"` is represented by the S-Expression
+   `(5:utf-811:hello-world)`.
+ - The `IRI` denoting `http://www.w3.org/` is represented by the
+   S-Expression `(3:iri(5:utf-818:http://www.w3.org/))`.
+ - The `Record` `#"*"()` is represented by the S-Expression
+   `(1:*1:*)`.
+ - The `Record` `#"*foo"(#"*bar")` is represented by the S-Expression
+   `(1:*4:*foo4:*bar)`.
+ - The `Record` with the empty list as its label and no fields is
+   represented by the S-Expression `(1:*(5:*list))`.
+ - `(7:rfc3339(5:utf-83:foo))` represents a well-formed `Value` that
+   is a `Record` with `#"rfc3339"` as its label, and a single `Text`
+   field. While it is a perfectly reasonable `Value`, it does *not*
+   represent a valid date or time, since the `Text` `"foo"` does not
+   conform to any of the RFC 3339 productions enumerated above.
+
+**Non-examples.** 
+ - `((5:*list))` is not a representation of the `Record` with the
+   empty list as its label and no fields, because that `Record` has a
+   non-`ByteString` as its label, mandating a `1:*` prefix on its
+   S-Expression representation.
+ - `(4:*foo4:*bar)` does not represent the `Record`
+   `#"*foo"(#"*bar")`, because the label `#"*foo"` begins with "`*`",
+   mandating a `1:*` prefix on the `Record`'s S-Expression
+   representation.
+
+## Examples
+
+((TODO: Give some examples of large and small SPKI-CAT documents,
+perhaps translated from various JSON blobs floating around the
+internet.))
+
+## Representing Values in Programming Languages
+
+We have given a definition of `Value` and its semantics, and proposed
+a concrete syntax for communicating and storing `Value`s. We now turn
+to **suggested** representations of `Value`s as *programming-language
+values* for various programming languages.
+
+### JavaScript
+
+ - `ByteString` ↔ `Uint8Array`
+ - `Number` ↔ numbers, problematically; bignums, perhaps; other?? TODO
+ - `List` ↔ `Array`
+ - `Map` ↔ `Object`
+ - `Record` ↔ an instance of something like `Record` below, unless the label is...
+    - `#"utf-8"` ↔ `String`
+    - `#"true"` ↔ `true`
+    - `#"false"` ↔ `false`
+    - `#"null"` ↔ `null`
+    - `#"undefined"` ↔ the undefined value
+    - `#"rfc3339"` ↔ `Date`, if the `Record`'s field matches the `date-time` RFC 3339 production
+
+```javascript
+function Record(label, ...fields) {
+  this.label = label;
+  this.fields = fields;
+}
+```
+
+### Scheme/Racket
+
+ - `ByteString` ↔ byte vector (Racket: "Bytes")
+ - `Number` ↔ numbers
+ - `List` ↔ (where possible, immutable) list
+ - `Map` ↔ hash-table
+ - `Record` ↔ a structure (Racket: a "prefab struct"), unless the label is...
+    - `#"utf-8"` ↔ a string
+    - `#"true"` ↔ `#t`
+    - `#"false"` ↔ `#f`
+    - `#"symbol"` ↔ a symbol
+
+### Java
+
+ - `ByteString` ↔ `byte[]`
+ - `Number` ↔ numbers, problematically; bignums, perhaps; other?? TODO
+ - `List` ↔ `java.util.List`
+ - `Map` ↔ `java.util.Map`
+ - `Record` ↔ an instance of something like `Record` below, unless the label is...
+    - `#"utf-8"` ↔ `java.lang.String`
+    - `#"true"` ↔ `java.lang.Boolean.TRUE`
+    - `#"false"` ↔ `java.lang.Boolean.FALSE`
+    - `#"null"` ↔ a special singleton object, but *not* Java's `null`
+    - `#"rfc3339"` ↔ `java.util.{Date,Time,Timestamp}`, according to which RFC 3339 production the `Record`'s field matches
+
+### Erlang
+
+ - `ByteString` ↔ a binary
+ - `Number` ↔ numbers, probably; TODO
+ - `List` ↔ a list
+ - `Map` ↔ a [map](http://erlang.org/doc/reference_manual/data_types.html#id77432) (new in Erlang/OTP R17)
+ - `Record` ↔ a tuple with the label in the first position, and the fields in subsequent positions, unless the label is...
+    - `#"true"` ↔ `true`
+    - `#"false"` ↔ `false`
+    - `#"null"` ↔ `null`
+    - `#"undefined"` ↔ `undefined`
+    - `#"symbol"` ↔ the `Text` field converted to an Erlang atom, if
+      some kind of an "unsafe" mode is set on the decoder (because
+      Erlang atoms are not GC'd); otherwise like any other kind of
+      `Record`
+
+---