Split out inessential text from the spec

2019-08-18 17:51:26 +01:00 · 2019-08-18 17:51:26 +01:00 · 9064258dbc
parent 1bb7e1862e
commit 9064258dbc
5 changed files with 500 additions and 479 deletions
--- a/conventions.md
+++ b/conventions.md
@ -0,0 +1,181 @@
+---
+---
+<title>Preserves: Conventions for Common Data Types</title>
+<link rel="stylesheet" href="preserves.css">
+
+# Preserves: Conventions for Common Data Types
+
+The `Value` data type is essentially an S-Expression, able to
+represent semi-structured data over `ByteString`, `String`,
+`SignedInteger` atoms and so on.[^why-not-spki-sexps]
+
+  [^why-not-spki-sexps]: Rivest's S-Expressions are in many ways
+    similar to Preserves. However, while they include binary data and
+    sequences, and an obvious equivalence for them exists, they lack
+    numbers *per se* as well as any kind of unordered structure such
+    as sets or maps. In addition, while “display hints” allow
+    labelling of binary data with an intended interpretation, they
+    cannot be attached to any other kind of structure, and the “hint”
+    itself can only be a binary blob.
+
+However, users need a wide variety of data types for representing
+domain-specific values such as various kinds of encoded and normalized
+text, calendrical values, machine words, and so on.
+
+Appropriately-labelled `Record`s denote these domain-specific data
+types.[^why-dictionaries]
+
+  [^why-dictionaries]: Given `Record`'s existence, it may seem odd
+    that `Dictionary`, `Set`, `Float`, etc. are given special
+    treatment. Preserves aims to offer a useful basic equivalence
+    predicate to programmers, and so if a data type demands a special
+    equivalence predicate, as `Dictionary`, `Set` and `Float` all do,
+    then the type should be included in the base language. Otherwise,
+    it can be represented as a `Record` and treated separately.
+    `Boolean`, `String` and `Symbol` are seeming exceptions. The first
+    two merit inclusion because of their cultural importance, while
+    `Symbol`s are included to allow their use as `Record` labels.
+    Primitive `Symbol` support avoids a bootstrapping issue.
+
+All of these conventions are optional. They form a layer atop the core
+`Value` structure. Non-domain-specific tools do not in general need to
+treat them specially.
+
+**Validity.** Many of the labels we will describe in this section come
+  with side-conditions on the contents of labelled `Record`s. It is
+  possible to construct an instance of `Value` that violates these
+  side-conditions without ceasing to be a `Value` or becoming
+  unrepresentable. However, we say that such a `Value` is *invalid*
+  because it fails to honour the necessary side-conditions.
+  Implementations *SHOULD* allow two modes of working: one which
+  treats all `Value`s identically, without regard for side-conditions,
+  and one which enforces validity (i.e. side-conditions) when reading,
+  writing, or constructing `Value`s.
+
+## IOLists.
+
+Inspired by Erlang's notions of
+[`iolist()` and `iodata()`](http://erlang.org/doc/reference_manual/typespec.html),
+an `IOList` is any tree constructed from `ByteString`s and
+`Sequence`s. Formally, an `IOList` is either a `ByteString` or a
+`Sequence` of `IOList`s.
+
+`IOList`s can be useful for
+[vectored I/O](https://en.wikipedia.org/wiki/Vectored_I/O).
+Additionally, the flexibility of `IOList` trees allows annotation of
+interior portions of a tree.
+
+## Comments.
+
+`String` values used as annotations are conventionally interpreted as
+comments.
+
+    @"I am a comment for the Dictionary"
+    {
+      @"I am a comment for the key"
+      key: @"I am a comment for the value"
+           value
+    }
+
+    @"I am a comment for this entire IOList"
+    [
+      #hex{00010203}
+      @"I am a comment for the middle half of the IOList"
+      @"A second comment for the same portion of the IOList"
+      [
+        @"I am a comment for the following ByteString"
+        #hex{04050607}
+        #hex{08090A0B}
+      ]
+      #hex{0C0D0E0F}
+    ]
+
+## MIME-type tagged binary data.
+
+Many internet protocols use
+[media types](https://tools.ietf.org/html/rfc6838) (a.k.a MIME types)
+to indicate the format of some associated binary data. For this
+purpose, we define `MIMEData` to be a record labelled `mime` with two
+fields, the first being a `Symbol`, the media type, and the second
+being a `ByteString`, the binary data.
+
+While each media type may define its own rules for comparing
+documents, we define ordering among `MIMEData` *representations* of
+such media types following the general rules for ordering of
+`Record`s.
+
+**Examples.**
+
+| Value                                      | Encoded hexadecimal byte sequence                                                                                 |
+|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
+| `<mime application/octet-stream #"abcde">` | 83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
+| `<mime text/plain #"ABC">`                 | 83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43                                                    |
+| `<mime application/xml #"<xhtml/>">`       | 83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E                   |
+| `<mime text/csv #"123,234,345">`           | 83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35                                  |
+
+Applications making heavy use of `mime` records may choose to use a
+placeholder number for the symbol `mime` as well as the symbols for
+individual media types. For example, if placeholder number 1 were
+chosen for `mime`, and placeholder number 7 for `text/plain`, the
+second example above, `<mime text/plain #"ABC">`, would be encoded as
+`83 11 17 63 41 42 43`.
+
+## Unicode normalization forms.
+
+Unicode defines multiple
+[normalization forms](http://unicode.org/reports/tr15/) for text.
+While no particular normalization form is required for `String`s,
+users may need to unambiguously signal or require a particular
+normalization form. A `NormalizedString` is a `Record` labelled with
+`unicode-normalization` and having two fields, the first of which is a
+`Symbol` specifying the normalization form used (e.g. `nfc`, `nfd`,
+`nfkc`, `nfkd`), and the second of which is a `String` whose
+underlying code point representation *MUST* be normalized according to
+the named normalization form.
+
+## IRIs (URIs, URLs, URNs, etc.).
+
+An `IRI` is a `Record` labelled with `iri` and having one field, a
+`String` which is the IRI itself and which *MUST* be a valid absolute
+or relative IRI.
+
+## Machine words.
+
+The definition of `SignedInteger` captures all integers. However, in
+certain circumstances it can be valuable to assert that a number
+inhabits a particular range, such as a fixed-width machine word.
+
+A family of labels `i`*n* and `u`*n* for *n* ∈ {8,16,32,64} denote
+*n*-bit-wide signed and unsigned range restrictions, respectively.
+Records with these labels *MUST* have one field, a `SignedInteger`,
+which *MUST* fall within the appropriate range. That is, to be valid,
+ - in `<i8 `*x*`>`, -128 <= *x* <= 127.
+ - in `<u8 `*x*`>`, 0 <= *x* <= 255.
+ - in `<i16 `*x*`>`, -32768 <= *x* <= 32767.
+ - etc.
+
+## Anonymous Tuples and Unit.
+
+A `Tuple` is a `Record` with label `tuple` and zero or more fields,
+denoting an anonymous tuple of values.
+
+The 0-ary tuple, `<tuple>`, denotes the empty tuple, sometimes called
+“unit” or “void” (but *not* e.g. JavaScript's “undefined” value).
+
+## Null and Undefined.
+
+Tony Hoare's
+“[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)”
+can be represented with the 0-ary `Record` `<null>`. An “undefined”
+value can be represented as `<undefined>`.
+
+## Dates and Times.
+
+Dates, times, moments, and timestamps can be represented with a
+`Record` with label `rfc3339` having a single field, a `String`, which
+*MUST* conform to one of the `full-date`, `partial-time`, `full-time`,
+or `date-time` productions of
+[section 5.6 of RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6).
+
+<!-- Heading to visually offset the footnotes from the main document: -->
+## Notes
--- a/preserves.md
+++ b/preserves.md
@ -26,7 +26,8 @@ programming languages.
 Preserves also supports the usual suite of atomic and compound data
 types, in particular including *binary* data as a distinct type from
 text strings. Its *annotations* allow separation of data from metadata
-such as comments, trace information, and provenance information.
+such as [comments](conventions.html#comments), trace information, and
+provenance information.

 Finally, Preserves defines precisely how to *compare* two values.
 Comparison is based on the data model, not on syntax or on data
@ -873,180 +874,6 @@ encodes to binary as follows:
        53 "Zip"        55 "94085"
        57 "Country"    52 "US"

-## Conventions for Common Data Types
-
-The `Value` data type is essentially an S-Expression, able to
-represent semi-structured data over `ByteString`, `String`,
-`SignedInteger` atoms and so on.[^why-not-spki-sexps]
-
-  [^why-not-spki-sexps]: Rivest's S-Expressions are in many ways
-    similar to Preserves. However, while they include binary data and
-    sequences, and an obvious equivalence for them exists, they lack
-    numbers *per se* as well as any kind of unordered structure such
-    as sets or maps. In addition, while “display hints” allow
-    labelling of binary data with an intended interpretation, they
-    cannot be attached to any other kind of structure, and the “hint”
-    itself can only be a binary blob.
-
-However, users need a wide variety of data types for representing
-domain-specific values such as various kinds of encoded and normalized
-text, calendrical values, machine words, and so on.
-
-Appropriately-labelled `Record`s denote these domain-specific data
-types.[^why-dictionaries]
-
-  [^why-dictionaries]: Given `Record`'s existence, it may seem odd
-    that `Dictionary`, `Set`, `Float`, etc. are given special
-    treatment. Preserves aims to offer a useful basic equivalence
-    predicate to programmers, and so if a data type demands a special
-    equivalence predicate, as `Dictionary`, `Set` and `Float` all do,
-    then the type should be included in the base language. Otherwise,
-    it can be represented as a `Record` and treated separately.
-    `Boolean`, `String` and `Symbol` are seeming exceptions. The first
-    two merit inclusion because of their cultural importance, while
-    `Symbol`s are included to allow their use as `Record` labels.
-    Primitive `Symbol` support avoids a bootstrapping issue.
-
-All of these conventions are optional. They form a layer atop the core
-`Value` structure. Non-domain-specific tools do not in general need to
-treat them specially.
-
-**Validity.** Many of the labels we will describe in this section come
-  with side-conditions on the contents of labelled `Record`s. It is
-  possible to construct an instance of `Value` that violates these
-  side-conditions without ceasing to be a `Value` or becoming
-  unrepresentable. However, we say that such a `Value` is *invalid*
-  because it fails to honour the necessary side-conditions.
-  Implementations *SHOULD* allow two modes of working: one which
-  treats all `Value`s identically, without regard for side-conditions,
-  and one which enforces validity (i.e. side-conditions) when reading,
-  writing, or constructing `Value`s.
-
-### IOLists.
-
-Inspired by Erlang's notions of
-[`iolist()` and `iodata()`](http://erlang.org/doc/reference_manual/typespec.html),
-an `IOList` is any tree constructed from `ByteString`s and
-`Sequence`s. Formally, an `IOList` is either a `ByteString` or a
-`Sequence` of `IOList`s.
-
-`IOList`s can be useful for
-[vectored I/O](https://en.wikipedia.org/wiki/Vectored_I/O).
-Additionally, the flexibility of `IOList` trees allows annotation of
-interior portions of a tree.
-
-### Comments.
-
-`String` values used as annotations are conventionally interpreted as
-comments.
-
-    @"I am a comment for the Dictionary"
-    {
-      @"I am a comment for the key"
-      key: @"I am a comment for the value"
-           value
-    }
-
-    @"I am a comment for this entire IOList"
-    [
-      #hex{00010203}
-      @"I am a comment for the middle half of the IOList"
-      @"A second comment for the same portion of the IOList"
-      [
-        @"I am a comment for the following ByteString"
-        #hex{04050607}
-        #hex{08090A0B}
-      ]
-      #hex{0C0D0E0F}
-    ]
-
-### MIME-type tagged binary data.
-
-Many internet protocols use
-[media types](https://tools.ietf.org/html/rfc6838) (a.k.a MIME types)
-to indicate the format of some associated binary data. For this
-purpose, we define `MIMEData` to be a record labelled `mime` with two
-fields, the first being a `Symbol`, the media type, and the second
-being a `ByteString`, the binary data.
-
-While each media type may define its own rules for comparing
-documents, we define ordering among `MIMEData` *representations* of
-such media types following the general rules for ordering of
-`Record`s.
-
-**Examples.**
-
-| Value                                      | Encoded hexadecimal byte sequence                                                                                 |
-|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
-| `<mime application/octet-stream #"abcde">` | 83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
-| `<mime text/plain #"ABC">`                 | 83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43                                                    |
-| `<mime application/xml #"<xhtml/>">`       | 83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E                   |
-| `<mime text/csv #"123,234,345">`           | 83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35                                  |
-
-Applications making heavy use of `mime` records may choose to use a
-placeholder number for the symbol `mime` as well as the symbols for
-individual media types. For example, if placeholder number 1 were
-chosen for `mime`, and placeholder number 7 for `text/plain`, the
-second example above, `<mime text/plain #"ABC">`, would be encoded as
-`83 11 17 63 41 42 43`.
-
-### Unicode normalization forms.
-
-Unicode defines multiple
-[normalization forms](http://unicode.org/reports/tr15/) for text.
-While no particular normalization form is required for `String`s,
-users may need to unambiguously signal or require a particular
-normalization form. A `NormalizedString` is a `Record` labelled with
-`unicode-normalization` and having two fields, the first of which is a
-`Symbol` specifying the normalization form used (e.g. `nfc`, `nfd`,
-`nfkc`, `nfkd`), and the second of which is a `String` whose
-underlying code point representation *MUST* be normalized according to
-the named normalization form.
-
-### IRIs (URIs, URLs, URNs, etc.).
-
-An `IRI` is a `Record` labelled with `iri` and having one field, a
-`String` which is the IRI itself and which *MUST* be a valid absolute
-or relative IRI.
-
-### Machine words.
-
-The definition of `SignedInteger` captures all integers. However, in
-certain circumstances it can be valuable to assert that a number
-inhabits a particular range, such as a fixed-width machine word.
-
-A family of labels `i`*n* and `u`*n* for *n* ∈ {8,16,32,64} denote
-*n*-bit-wide signed and unsigned range restrictions, respectively.
-Records with these labels *MUST* have one field, a `SignedInteger`,
-which *MUST* fall within the appropriate range. That is, to be valid,
- - in `<i8 `*x*`>`, -128 <= *x* <= 127.
- - in `<u8 `*x*`>`, 0 <= *x* <= 255.
- - in `<i16 `*x*`>`, -32768 <= *x* <= 32767.
- - etc.
-
-### Anonymous Tuples and Unit.
-
-A `Tuple` is a `Record` with label `tuple` and zero or more fields,
-denoting an anonymous tuple of values.
-
-The 0-ary tuple, `<tuple>`, denotes the empty tuple, sometimes called
-“unit” or “void” (but *not* e.g. JavaScript's “undefined” value).
-
-### Null and Undefined.
-
-Tony Hoare's
-“[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)”
-can be represented with the 0-ary `Record` `<null>`. An “undefined”
-value can be represented as `<undefined>`.
-
-### Dates and Times.
-
-Dates, times, moments, and timestamps can be represented with a
-`Record` with label `rfc3339` having a single field, a `String`, which
-*MUST* conform to one of the `full-date`, `partial-time`, `full-time`,
-or `date-time` productions of
-[section 5.6 of RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6).
-
 ## Security Considerations

 **Empty chunks.** Chunks of zero length are prohibited in streamed
@ -1141,309 +968,5 @@ Then, if `ttnn`=`0001`, `l` is the placeholder number; otherwise, `l`
 is the length of the body that follows, counted in bytes for `tt`=`01`
 and in `Repr`s for `tt`=`10`.

-<!-- Not yet ready
-
-## Appendix. Representing Values in Programming Languages
-
-We have given a definition of `Value` and its semantics, and proposed
-a concrete syntax for communicating and storing `Value`s. We now turn
-to **suggested** representations of `Value`s as *programming-language
-values* for various programming languages.
-
-When designing a language mapping, an important consideration is
-roundtripping: serialization after deserialization, and vice versa,
-should both be identities.
-
-Also, the presence or absence of annotations on a `Value` should not
-affect comparisons of that `Value` to others in any way.
-
-### JavaScript.
-
- - `Boolean` ↔ `Boolean`
- - `Float` and `Double` ↔ numbers
- - `SignedInteger` ↔ numbers or `BigInt` (see [here](https://developers.google.com/web/updates/2018/05/bigint) and [here](https://github.com/tc39/proposal-bigint))
- - `String` ↔ strings
- - `ByteString` ↔ `Uint8Array`
- - `Symbol` ↔ `Symbol.for(...)`
- - `Record` ↔ `{ "_label": theLabel, "_fields": [field0, ..., fieldN] }`, plus convenience accessors
-    - `(undefined)` ↔ the undefined value
-    - `(rfc3339 F)` ↔ `Date`, if `F` matches the `date-time` RFC 3339 production
- - `Sequence` ↔ `Array`
- - `Set` ↔ `{ "_set": M }` where `M` is a `Map` from the elements of the set to `true`
- - `Dictionary` ↔ a `Map`
-
-### Scheme/Racket.
-
- - `Boolean` ↔ booleans
- - `Float` and `Double` ↔ inexact numbers (Racket: single- and double-precision floats)
- - `SignedInteger` ↔ exact numbers
- - `String` ↔ strings
- - `ByteString` ↔ byte vector (Racket: "Bytes")
- - `Symbol` ↔ symbols
- - `Record` ↔ structures (Racket: prefab struct)
- - `Sequence` ↔ lists
- - `Set` ↔ Racket: sets
- - `Dictionary` ↔ Racket: hash-table
-
-### Java.
-
- - `Boolean` ↔ `Boolean`
- - `Float` and `Double` ↔ `Float` and `Double`
- - `SignedInteger` ↔ `Integer`, `Long`, `BigInteger`
- - `String` ↔ `String`
- - `ByteString` ↔ `byte[]`
- - `Symbol` ↔ a simple data class wrapping a `String`
- - `Record` ↔ in a simple implementation, a generic `Record` class; else perhaps a bean mapping?
-    - `(mime T B)` ↔ an implementation of `javax.activation.DataSource`?
- - `Sequence` ↔ an implementation of `java.util.List`
- - `Set` ↔ an implementation of `java.util.Set`
- - `Dictionary` ↔ an implementation of `java.util.Map`
-
-### Erlang.
-
- - `Boolean` ↔ `true` and `false`
- - `Float` and `Double` ↔ floats (unsure how Erlang deals with single-precision)
- - `SignedInteger` ↔ integers
- - `String` ↔ pair of `utf8` and a binary
- - `ByteString` ↔ a binary
- - `Symbol` ↔ pair of `atom` and a binary
- - `Record` ↔ triple of `obj`, label, and field list
- - `Sequence` ↔ a list
- - `Set` ↔ a `sets` set
- - `Dictionary` ↔ a [map][erlang-map] (new in Erlang/OTP R17)
-
-This is a somewhat unsatisfactory mapping because: (a) Erlang doesn't
-garbage-collect its atoms, meaning that (a.1) representing `Symbol`s
-as atoms could lead to denial-of-service and (a.2) representing
-`Symbol`-labelled `Record`s as Erlang records must be rejected for the
-same reason; (b) even if it did, Erlang's boolean values are atoms,
-which would then clash with the `Symbol`s `true` and `false`; and (c)
-Erlang has no distinct string type, making for a trilemma where
-`String`s are in danger of clashing with `ByteString`s, `Sequence`s,
-or `Record`s.
-
-### Python.
-
- - `Boolean` ↔ `True` and `False`
- - `Float` ↔ a `Float` wrapper-class for a double-precision value
- - `Double` ↔ float
- - `SignedInteger` ↔ int and long
- - `String` ↔ `unicode`
- - `ByteString` ↔ `bytes`
- - `Symbol` ↔ a simple data class wrapping a `unicode`
- - `Record` ↔ something like `namedtuple`, but that doesn't care about class identity?
- - `Sequence` ↔ `tuple` (but accept `list` during encoding)
- - `Set` ↔ `frozenset` (but accept `set` during encoding)
- - `Dictionary` ↔ a hashable (immutable) dictionary-like thing (but accept `dict` during encoding)
-
-### Squeak Smalltalk.
-
- - `Boolean` ↔ `true` and `false`
- - `Float` ↔ perhaps a subclass of `Float`?
- - `Double` ↔ `Float`
- - `SignedInteger` ↔ `Integer`
- - `String` ↔ `WideString`
- - `ByteString` ↔ `ByteArray`
- - `Symbol` ↔ `WideSymbol`
- - `Record` ↔ a simple data class
- - `Sequence` ↔ `ArrayedCollection` (usually `OrderedCollection`)
- - `Set` ↔ `Set`
- - `Dictionary` ↔ `Dictionary`
-
-->
-
-## Appendix. Why not Just Use JSON?
-
-<!-- JSON lacks semantics: JSON syntax doesn't denote anything -->
-
-JSON offers *syntax* for numbers, strings, booleans, null, arrays and
-string-keyed maps. However, it suffers from two major problems. First,
-it offers no *semantics* for the syntax: it is left to each
-implementation to determine how to treat each JSON term. This causes
-[interoperability](http://seriot.ch/parsing_json.php) and even
-[security](http://web.archive.org/web/20180906202559/http://docs.couchdb.org/en/stable/cve/2017-12635.html)
-issues. Second, JSON's lack of support for type tags leads to awkward
-and incompatible *encodings* of type information in terms of the fixed
-suite of constructors on offer.
-
-There are other minor problems with JSON having to do with its syntax.
-Examples include its relative verbosity and its lack of support for
-binary data.
-
-### JSON syntax doesn't *mean* anything
-
-When are two JSON values the same? When are they different?
-<!-- When is one JSON value “less than” another? -->
-
-The specifications are largely silent on these questions. Different
-JSON implementations give different answers.
-
-Specifically, JSON does not:
-
- - assign any meaning to numbers,[^meaning-ieee-double]
- - determine how strings are to be compared,[^string-key-comparison]
- - determine whether object key ordering is significant,[^json-member-ordering] or
- - determine whether duplicate object keys are permitted, what it
-   would mean if they were, or how to determine a duplicate in the
-   first place.[^json-key-uniqueness]
-
-In short, JSON syntax doesn't *denote* anything.[^xml-infoset] [^other-formats]
-
-  [^meaning-ieee-double]:
-    [Section 6 of RFC 8259](https://tools.ietf.org/html/rfc8259#section-6)
-    does go so far as to indicate “good interoperability can be
-    achieved” by imagining that parsers are able reliably to
-    understand the syntax of numbers as denoting an IEEE 754
-    double-precision floating-point value.
-
-  [^string-key-comparison]:
-    [Section 8.3 of RFC 8259](https://tools.ietf.org/html/rfc8259#section-8.3)
-    suggests that *if* an implementation compares strings used as
-    object keys “code unit by code unit”, then it will interoperate
-    with *other such implementations*, but neither requires this
-    behaviour nor discusses comparisons of strings used in other
-    contexts.
-
-  [^json-member-ordering]:
-    [Section 4 of RFC 8259](https://tools.ietf.org/html/rfc8259#section-4)
-    remarks that “[implementations] differ as to whether or not they
-    make the ordering of object members visible to calling software.”
-
-  [^json-key-uniqueness]:
-    [Section 4 of RFC 8259](https://tools.ietf.org/html/rfc8259#section-4)
-    is the only place in the specification that mentions the issue. It
-    explicitly sanctions implementations supporting duplicate keys,
-    noting only that “when the names within an object are not unique,
-    the behavior of software that receives such an object is
-    unpredictable.” Implementations are free to choose any behaviour
-    at all in this situation, including signalling an error, or
-    discarding all but one of a set of duplicates.
-
-  [^xml-infoset]: The XML world has the concept of
-    [XML infoset](https://www.w3.org/TR/xml-infoset/). Loosely
-    speaking, XML infoset is the *denotation* of an XML document; the
-    *meaning* of the document.
-
-  [^other-formats]: Most other recent data languages are like JSON in
-    specifying only a syntax with no associated semantics. While some
-    do make a sketch of a semantics, the result is often
-    underspecified (e.g. in terms of how strings are to be compared),
-    overly machine-oriented (e.g. treating 32-bit integers as
-    fundamentally distinct from 64-bit integers and from
-    floating-point numbers), overly fine (e.g. giving visibility to
-    the order in which map entries are written), or all three.
-
-Some examples:
-
- - are the JSON values `1`, `1.0`, and `1e0` the same or different?
- - are the JSON values `1.0` and `1.0000000000000001` the same or different?
- - are the JSON strings `"päron"` (UTF-8 `70c3a4726f6e`) and `"päron"`
-   (UTF-8 `7061cc88726f6e`) the same or different?
- - are the JSON objects `{"a":1, "b":2}` and `{"b":2, "a":1}` the same
-   or different?
- - which, if any, of `{"a":1, "a":2}`, `{"a":1}` and `{"a":2}` are the
-   same? Are all three legal?
- - are `{"päron":1}` and `{"päron":1}` the same or different?
-
-### JSON can multiply nicely, but it can't add very well
-
-JSON includes a fixed set of types: numbers, strings, booleans, null,
-arrays and string-keyed maps. Domain-specific data must be *encoded*
-into these types. For example, dates and email addresses are often
-represented as strings with an implicit internal structure.
-
-There is no convention for *labelling* a value as belonging to a
-particular category. Instead, JSON-encoded data are often labelled in
-an ad-hoc way. Multiple incompatible approaches exist. For example, a
-“money” structure containing a `currency` field and an `amount` may be
-represented in any number of ways:
-
-    { "_type": "money", "currency": "EUR", "amount": 10 }
-    { "type": "money", "value": { "currency": "EUR", "amount": 10 } }
-    [ "money", { "currency": "EUR", "amount": 10 } ]
-    { "@money": { "currency": "EUR", "amount": 10 } }
-
-This causes particular problems when JSON is used to represent *sum*
-or *union* types, such as “either a value or an error, but not both”.
-Again, multiple incompatible approaches exist.
-
-For example, imagine an API for depositing money in an account. The
-response might be either a “success” response indicating the new
-balance, or one of a set of possible errors.
-
-Sometimes, a *pair* of values is used, with `null` marking the option
-not taken.[^interesting-failure-mode]
-
-    { "ok": { "balance": 210 }, "error": null }
-    { "ok": null, "error": "Unauthorized" }
-
-  [^interesting-failure-mode]: What is the meaning of a document where
-    both `ok` and `error` are non-null? What might happen when a
-    program is presented with such a document?
-
-The branch not chosen is sometimes present, sometimes omitted as if it
-were an optional field:
-
-    { "ok": { "balance": 210 } }
-    { "error": "Unauthorized" }
-
-Sometimes, an array of a label and a value is used:
-
-    [ "ok", { "balance": 210 } ]
-    [ "error", "Unauthorized" ]
-
-Sometimes, the shape of the data is sufficient to distinguish among
-the alternatives, and the label is left implicit:
-
-    { "balance": 210 }
-    "Unauthorized"
-
-JSON itself does not offer any guidance for which of these options to
-choose. In many real cases on the web, poor choices have led to
-encodings that are irrecoverably ambiguous.
-
-# Open questions
-
-Q. Should "symbols" instead be URIs? Relative, usually; relative to
-what? Some domain-specific base URI?
-
-Q. Literal small integers: are they pulling their weight? They're not
-absolutely necessary.
-
-Q. Should we go for trying to make the data ordering line up with the
-encoding ordering? We'd have to only use streaming forms, and avoid
-the small integer encoding, and not store record arities, and sort
-sets and dictionaries, and mask floats and doubles (perhaps
-[like this](https://stackoverflow.com/questions/43299299/sorting-floating-point-values-using-their-byte-representation)),
-and perhaps pick a specific `NaN`, and I don't know what to do about
-SignedIntegers. Perhaps make them more like float formats, with the
-byte count acting as a kind of exponent underneath the sign bit.
-
- - Perhaps define separate additional canonicalization restrictions?
-   Doesn't help the ordering, but does help the equivalence.
-
- - Canonicalization and early-bailout-equivalence-checking are in
-   tension with support for streaming values.
-
-Q. To remain compatible with JSON, portions of the text syntax have to
-remain case-insensitive (`%i"..."`). However, non-JSON extensions do
-not. There's only one (?) at the moment, the `%i"f"` in `Float`;
-should it be changed to case-sensitive?
-
-Q. Should `IOList`s be wrapped in an identifying unary record constructor?
-
-TODO: Examples of the ordering. `"bzz" < "c" < "caa"`; `#true < 3 < "3" < |3|`
-
-TODO: Probably should add a canonicalized subset. Consider adding
-explicit "I promise this is canonical" marker, like a BOM, which
-identifies a binary value as (first) binary and (second, optionally)
-as canonical. UTF-8 disallows byte `0xFF` from appearing anywhere in a
-text; this might be a good candidate for a marker sequence.
-((Actually, perhaps `0x10` would be good! It corresponds to DLE, "data
-link escape"; it is not a printable ASCII character, and is disallowed
-in the textual Preserves grammar; and it is also mnemonic for "version
-0", since it is the Preserves binary encoding of the small integer
-zero.))
-
 <!-- Heading to visually offset the footnotes from the main document: -->
 ## Notes
--- a/questions.md
+++ b/questions.md
@ -0,0 +1,47 @@
+---
+---
+<title>Preserves: Open questions</title>
+<link rel="stylesheet" href="preserves.css">
+
+# Open questions
+
+Q. Should "symbols" instead be URIs? Relative, usually; relative to
+what? Some domain-specific base URI?
+
+Q. Literal small integers: are they pulling their weight? They're not
+absolutely necessary.
+
+Q. Should we go for trying to make the data ordering line up with the
+encoding ordering? We'd have to only use streaming forms, and avoid
+the small integer encoding, and not store record arities, and sort
+sets and dictionaries, and mask floats and doubles (perhaps
+[like this](https://stackoverflow.com/questions/43299299/sorting-floating-point-values-using-their-byte-representation)),
+and perhaps pick a specific `NaN`, and I don't know what to do about
+SignedIntegers. Perhaps make them more like float formats, with the
+byte count acting as a kind of exponent underneath the sign bit.
+
+ - Perhaps define separate additional canonicalization restrictions?
+   Doesn't help the ordering, but does help the equivalence.
+
+ - Canonicalization and early-bailout-equivalence-checking are in
+   tension with support for streaming values.
+
+Q. To remain compatible with JSON, portions of the text syntax have to
+remain case-insensitive (`%i"..."`). However, non-JSON extensions do
+not. There's only one (?) at the moment, the `%i"f"` in `Float`;
+should it be changed to case-sensitive?
+
+Q. Should `IOList`s be wrapped in an identifying unary record constructor?
+
+TODO: Examples of the ordering. `"bzz" < "c" < "caa"`; `#true < 3 < "3" < |3|`
+
+TODO: Probably should add a canonicalized subset. Consider adding
+explicit "I promise this is canonical" marker, like a BOM, which
+identifies a binary value as (first) binary and (second, optionally)
+as canonical. UTF-8 disallows byte `0xFF` from appearing anywhere in a
+text; this might be a good candidate for a marker sequence.
+((Actually, perhaps `0x10` would be good! It corresponds to DLE, "data
+link escape"; it is not a printable ASCII character, and is disallowed
+in the textual Preserves grammar; and it is also mnemonic for "version
+0", since it is the Preserves binary encoding of the small integer
+zero.))
--- a/representations.md
+++ b/representations.md
@ -0,0 +1,113 @@
+---
+---
+<title>Preserves: Representing Values in Programming Languages</title>
+<link rel="stylesheet" href="preserves.css">
+
+# Preserves: Representing Values in Programming Languages
+
+**NOT YET READY**
+
+We have given a definition of `Value` and its semantics, and proposed
+a concrete syntax for communicating and storing `Value`s. We now turn
+to **suggested** representations of `Value`s as *programming-language
+values* for various programming languages.
+
+When designing a language mapping, an important consideration is
+roundtripping: serialization after deserialization, and vice versa,
+should both be identities.
+
+Also, the presence or absence of annotations on a `Value` should not
+affect comparisons of that `Value` to others in any way.
+
+## JavaScript.
+
+ - `Boolean` ↔ `Boolean`
+ - `Float` and `Double` ↔ numbers
+ - `SignedInteger` ↔ numbers or `BigInt` (see [here](https://developers.google.com/web/updates/2018/05/bigint) and [here](https://github.com/tc39/proposal-bigint))
+ - `String` ↔ strings
+ - `ByteString` ↔ `Uint8Array`
+ - `Symbol` ↔ `Symbol.for(...)`
+ - `Record` ↔ `{ "_label": theLabel, "_fields": [field0, ..., fieldN] }`, plus convenience accessors
+    - `(undefined)` ↔ the undefined value
+    - `(rfc3339 F)` ↔ `Date`, if `F` matches the `date-time` RFC 3339 production
+ - `Sequence` ↔ `Array`
+ - `Set` ↔ `{ "_set": M }` where `M` is a `Map` from the elements of the set to `true`
+ - `Dictionary` ↔ a `Map`
+
+## Scheme/Racket.
+
+ - `Boolean` ↔ booleans
+ - `Float` and `Double` ↔ inexact numbers (Racket: single- and double-precision floats)
+ - `SignedInteger` ↔ exact numbers
+ - `String` ↔ strings
+ - `ByteString` ↔ byte vector (Racket: "Bytes")
+ - `Symbol` ↔ symbols
+ - `Record` ↔ structures (Racket: prefab struct)
+ - `Sequence` ↔ lists
+ - `Set` ↔ Racket: sets
+ - `Dictionary` ↔ Racket: hash-table
+
+## Java.
+
+ - `Boolean` ↔ `Boolean`
+ - `Float` and `Double` ↔ `Float` and `Double`
+ - `SignedInteger` ↔ `Integer`, `Long`, `BigInteger`
+ - `String` ↔ `String`
+ - `ByteString` ↔ `byte[]`
+ - `Symbol` ↔ a simple data class wrapping a `String`
+ - `Record` ↔ in a simple implementation, a generic `Record` class; else perhaps a bean mapping?
+    - `(mime T B)` ↔ an implementation of `javax.activation.DataSource`?
+ - `Sequence` ↔ an implementation of `java.util.List`
+ - `Set` ↔ an implementation of `java.util.Set`
+ - `Dictionary` ↔ an implementation of `java.util.Map`
+
+## Erlang.
+
+ - `Boolean` ↔ `true` and `false`
+ - `Float` and `Double` ↔ floats (unsure how Erlang deals with single-precision)
+ - `SignedInteger` ↔ integers
+ - `String` ↔ pair of `utf8` and a binary
+ - `ByteString` ↔ a binary
+ - `Symbol` ↔ pair of `atom` and a binary
+ - `Record` ↔ triple of `obj`, label, and field list
+ - `Sequence` ↔ a list
+ - `Set` ↔ a `sets` set
+ - `Dictionary` ↔ a [map][erlang-map] (new in Erlang/OTP R17)
+
+This is a somewhat unsatisfactory mapping because: (a) Erlang doesn't
+garbage-collect its atoms, meaning that (a.1) representing `Symbol`s
+as atoms could lead to denial-of-service and (a.2) representing
+`Symbol`-labelled `Record`s as Erlang records must be rejected for the
+same reason; (b) even if it did, Erlang's boolean values are atoms,
+which would then clash with the `Symbol`s `true` and `false`; and (c)
+Erlang has no distinct string type, making for a trilemma where
+`String`s are in danger of clashing with `ByteString`s, `Sequence`s,
+or `Record`s.
+
+## Python.
+
+ - `Boolean` ↔ `True` and `False`
+ - `Float` ↔ a `Float` wrapper-class for a double-precision value
+ - `Double` ↔ float
+ - `SignedInteger` ↔ int and long
+ - `String` ↔ `unicode`
+ - `ByteString` ↔ `bytes`
+ - `Symbol` ↔ a simple data class wrapping a `unicode`
+ - `Record` ↔ something like `namedtuple`, but that doesn't care about class identity?
+ - `Sequence` ↔ `tuple` (but accept `list` during encoding)
+ - `Set` ↔ `frozenset` (but accept `set` during encoding)
+ - `Dictionary` ↔ a hashable (immutable) dictionary-like thing (but accept `dict` during encoding)
+
+## Squeak Smalltalk.
+
+ - `Boolean` ↔ `true` and `false`
+ - `Float` ↔ perhaps a subclass of `Float`?
+ - `Double` ↔ `Float`
+ - `SignedInteger` ↔ `Integer`
+ - `String` ↔ `WideString`
+ - `ByteString` ↔ `ByteArray`
+ - `Symbol` ↔ `WideSymbol`
+ - `Record` ↔ a simple data class
+ - `Sequence` ↔ `ArrayedCollection` (usually `OrderedCollection`)
+ - `Set` ↔ `Set`
+ - `Dictionary` ↔ `Dictionary`
--- a/why-not-json.md
+++ b/why-not-json.md
@ -0,0 +1,157 @@
+---
+---
+<title>Preserves: Why not Just Use JSON?</title>
+<link rel="stylesheet" href="preserves.css">
+
+# Why not Just Use JSON?
+
+<!-- JSON lacks semantics: JSON syntax doesn't denote anything -->
+
+JSON offers *syntax* for numbers, strings, booleans, null, arrays and
+string-keyed maps. However, it suffers from two major problems. First,
+it offers no *semantics* for the syntax: it is left to each
+implementation to determine how to treat each JSON term. This causes
+[interoperability](http://seriot.ch/parsing_json.php) and even
+[security](http://web.archive.org/web/20180906202559/http://docs.couchdb.org/en/stable/cve/2017-12635.html)
+issues. Second, JSON's lack of support for type tags leads to awkward
+and incompatible *encodings* of type information in terms of the fixed
+suite of constructors on offer.
+
+There are other minor problems with JSON having to do with its syntax.
+Examples include its relative verbosity and its lack of support for
+binary data.
+
+## JSON syntax doesn't *mean* anything
+
+When are two JSON values the same? When are they different?
+<!-- When is one JSON value “less than” another? -->
+
+The specifications are largely silent on these questions. Different
+JSON implementations give different answers.
+
+Specifically, JSON does not:
+
+ - assign any meaning to numbers,[^meaning-ieee-double]
+ - determine how strings are to be compared,[^string-key-comparison]
+ - determine whether object key ordering is significant,[^json-member-ordering] or
+ - determine whether duplicate object keys are permitted, what it
+   would mean if they were, or how to determine a duplicate in the
+   first place.[^json-key-uniqueness]
+
+In short, JSON syntax doesn't *denote* anything.[^xml-infoset] [^other-formats]
+
+  [^meaning-ieee-double]:
+    [Section 6 of RFC 8259](https://tools.ietf.org/html/rfc8259#section-6)
+    does go so far as to indicate “good interoperability can be
+    achieved” by imagining that parsers are able reliably to
+    understand the syntax of numbers as denoting an IEEE 754
+    double-precision floating-point value.
+
+  [^string-key-comparison]:
+    [Section 8.3 of RFC 8259](https://tools.ietf.org/html/rfc8259#section-8.3)
+    suggests that *if* an implementation compares strings used as
+    object keys “code unit by code unit”, then it will interoperate
+    with *other such implementations*, but neither requires this
+    behaviour nor discusses comparisons of strings used in other
+    contexts.
+
+  [^json-member-ordering]:
+    [Section 4 of RFC 8259](https://tools.ietf.org/html/rfc8259#section-4)
+    remarks that “[implementations] differ as to whether or not they
+    make the ordering of object members visible to calling software.”
+
+  [^json-key-uniqueness]:
+    [Section 4 of RFC 8259](https://tools.ietf.org/html/rfc8259#section-4)
+    is the only place in the specification that mentions the issue. It
+    explicitly sanctions implementations supporting duplicate keys,
+    noting only that “when the names within an object are not unique,
+    the behavior of software that receives such an object is
+    unpredictable.” Implementations are free to choose any behaviour
+    at all in this situation, including signalling an error, or
+    discarding all but one of a set of duplicates.
+
+  [^xml-infoset]: The XML world has the concept of
+    [XML infoset](https://www.w3.org/TR/xml-infoset/). Loosely
+    speaking, XML infoset is the *denotation* of an XML document; the
+    *meaning* of the document.
+
+  [^other-formats]: Most other recent data languages are like JSON in
+    specifying only a syntax with no associated semantics. While some
+    do make a sketch of a semantics, the result is often
+    underspecified (e.g. in terms of how strings are to be compared),
+    overly machine-oriented (e.g. treating 32-bit integers as
+    fundamentally distinct from 64-bit integers and from
+    floating-point numbers), overly fine (e.g. giving visibility to
+    the order in which map entries are written), or all three.
+
+Some examples:
+
+ - are the JSON values `1`, `1.0`, and `1e0` the same or different?
+ - are the JSON values `1.0` and `1.0000000000000001` the same or different?
+ - are the JSON strings `"päron"` (UTF-8 `70c3a4726f6e`) and `"päron"`
+   (UTF-8 `7061cc88726f6e`) the same or different?
+ - are the JSON objects `{"a":1, "b":2}` and `{"b":2, "a":1}` the same
+   or different?
+ - which, if any, of `{"a":1, "a":2}`, `{"a":1}` and `{"a":2}` are the
+   same? Are all three legal?
+ - are `{"päron":1}` and `{"päron":1}` the same or different?
+
+## JSON can multiply nicely, but it can't add very well
+
+JSON includes a fixed set of types: numbers, strings, booleans, null,
+arrays and string-keyed maps. Domain-specific data must be *encoded*
+into these types. For example, dates and email addresses are often
+represented as strings with an implicit internal structure.
+
+There is no convention for *labelling* a value as belonging to a
+particular category. Instead, JSON-encoded data are often labelled in
+an ad-hoc way. Multiple incompatible approaches exist. For example, a
+“money” structure containing a `currency` field and an `amount` may be
+represented in any number of ways:
+
+    { "_type": "money", "currency": "EUR", "amount": 10 }
+    { "type": "money", "value": { "currency": "EUR", "amount": 10 } }
+    [ "money", { "currency": "EUR", "amount": 10 } ]
+    { "@money": { "currency": "EUR", "amount": 10 } }
+
+This causes particular problems when JSON is used to represent *sum*
+or *union* types, such as “either a value or an error, but not both”.
+Again, multiple incompatible approaches exist.
+
+For example, imagine an API for depositing money in an account. The
+response might be either a “success” response indicating the new
+balance, or one of a set of possible errors.
+
+Sometimes, a *pair* of values is used, with `null` marking the option
+not taken.[^interesting-failure-mode]
+
+    { "ok": { "balance": 210 }, "error": null }
+    { "ok": null, "error": "Unauthorized" }
+
+  [^interesting-failure-mode]: What is the meaning of a document where
+    both `ok` and `error` are non-null? What might happen when a
+    program is presented with such a document?
+
+The branch not chosen is sometimes present, sometimes omitted as if it
+were an optional field:
+
+    { "ok": { "balance": 210 } }
+    { "error": "Unauthorized" }
+
+Sometimes, an array of a label and a value is used:
+
+    [ "ok", { "balance": 210 } ]
+    [ "error", "Unauthorized" ]
+
+Sometimes, the shape of the data is sufficient to distinguish among
+the alternatives, and the label is left implicit:
+
+    { "balance": 210 }
+    "Unauthorized"
+
+JSON itself does not offer any guidance for which of these options to
+choose. In many real cases on the web, poor choices have led to
+encodings that are irrecoverably ambiguous.
+
+<!-- Heading to visually offset the footnotes from the main document: -->
+## Notes