diff --git a/syndicate/codec.md b/syndicate/codec.md new file mode 100644 index 0000000..51e8076 --- /dev/null +++ b/syndicate/codec.md @@ -0,0 +1,662 @@ +--- +--- + + +2018-06-04 20:31:48 TODO: cwebber's email comments +2018-06-04 20:31:51 TODO: look at https://github.com/imbal/rson and at clojure EDN +2018-06-05 10:32:02 ... and at http://json-schema.org/latest/json-schema-core.html#rfc.section.4.2 + +# SPKI CAT: SPKI S-Expressions with Canonical Atom Tags + +Tony Garnock-Jones +Christopher Lemmer Webber +May 2018 +Version 0.0.1 + + ________________ + / \ + /\__/\ / boo! \ + / \ \ i'm very spki! \ + \=\_^__^_/= ___/\__________________/ + |/ \ + \\ | | / + <_|--|_> + +## Introduction + +This document proposes a language-neutral JSON-like *data type*, along +with a robust equivalence relation ("semantics") and a total ordering +over inhabitants of the type.[^tjson] + + [^tjson]: [TJSON](https://www.tjson.org/) has a similar aim: + “different on-the-wire representations of an object correspond to + the same typed data object” + ([source](https://news.ycombinator.com/item?id=12860143)); “TJSON + is defined as a serialization format on top of a JSON-like data + model” ([source](https://news.ycombinator.com/item?id=12860401)). + +It then suggests conventions for encoding common data formats in terms +of the proposed data type. + +Finally, it proposes concrete *syntax* for the data type, offering a +language-neutral transfer syntax (based on +[Rivest's S-Expressions][sexp.txt] as used in [SPKI/SDSI][spki]) and +suggesting possible language-specific representations for the data +type's inhabitants. + + [sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt + [spki]: http://world.std.com/~cme/html/spki.html + +### Why not Just Use JSON? + + + +JSON offers *syntax* for numbers, strings, booleans, null, arrays and +string-keyed maps. However, it offers no *semantics* for the syntax: +it is left to each implementation to determine how to treat each JSON +term. This causes +[interoperability](http://seriot.ch/parsing_json.php) and even +[security](http://seriot.ch/parsing_json.php) issues. + +Specifically, JSON does not: + + - assign any meaning to numbers,[^meaning-ieee-double] + - determine how strings are to be compared,[^string-key-comparison] + - determine whether object key ordering is significant, or + - determine whether duplicate object keys are permitted, what it + would mean if they were, or how to determine a duplicate in the + first place. + +In short, JSON syntax doesn't *denote* anything.[^xml-infoset] [^other-formats] + + [^meaning-ieee-double]: + [Section 6 of RFC 7159](https://tools.ietf.org/html/rfc7159#section-6) + does go so far as to indicate “good interoperability can be + achieved” by imagining that parsers are able reliably to + understand the syntax of numbers as denoting an IEEE 754 + double-precision floating-point value. + + [^string-key-comparison]: + [Section 8.3 of RFC 7159](https://tools.ietf.org/html/rfc7159#section-8.3) + suggests that *if* an implementation compares strings used as + object keys “code unit by code unit”, then it will interoperate + with *other such implementations*, but neither requires this + behaviour nor discusses comparisons of strings used in other + contexts. + + [^xml-infoset]: The XML world has the concept of + [XML infoset](https://www.w3.org/TR/xml-infoset/). Loosely + speaking, XML infoset is the *denotation* of an XML document; the + *meaning* of the document. + + [^other-formats]: Most other recent data languages are like JSON in + specifying only a syntax with no associated semantics. While some + do make a sketch of a semantics, the result is often + underspecified (e.g. in terms of how strings are to be compared), + overly machine-oriented (e.g. treating 32-bit integers as + fundamentally distinct from 64-bit integers and from + floating-point numbers), overly fine (e.g. giving visibility to + the order in which map entries are written), or all three. + +Some examples: + + - are the JSON values `1`, `1.0`, and `1e0` the same or different? + - are the JSON values `1.0` and `1.0000000000000001` the same or different? + - are the JSON strings `"päron"` (UTF-8 `70c3a4726f6e`) and `"päron"` + (UTF-8 `7061cc88726f6e`) the same or different? + - are the JSON objects `{"a":1, "b":2}` and `{"b":2, "a":1}` the same + or different? + - which, if any, of `{"a":1, "a":2}`, `{"a":1}` and `{"a":2}` are the + same? Are all three legal? + - are `{"päron":1}` and `{"päron":1}` the same or different? + +Different JSON implementations give different answers to these +questions. The JSON specifications are silent on these questions. + +There are other minor problems with JSON having to do with its syntax. +Examples include its relative verbosity and its lack of support for +binary data. + +## Starting with Semantics + +Taking inspiration from functional programming, we start with a +definition of the *values* that we want to work with and give them +meaning independent of their syntax. We will treat syntax separately, +later in this document. + +We will want our data type to accommodate *atoms* (numbers and text), +*products* (both tuples and sequences), and *labelled +sums*.[^zephyr-asdl] It should also include *keyed maps*. We should +avoid unnecessary restrictions such as machine-oriented fixed-width +integer or floating-point values where possible. + + [^zephyr-asdl]: This design was loosely inspired by Zephyr ASDL (h/t + [Darius Bacon](https://twitter.com/abecedarius/status/993545767884226561)), + which doesn't offer much in the way of atoms, but offers + general-purpose labelled sums and products. See D. C. Wang, A. W. + Appel, J. L. Korn, and C. S. Serra, “The Zephyr Abstract Syntax + Description Language,” in USENIX Conference on Domain-Specific + Languages, 1997, pp. 213–228. + [PDF available.](https://www.usenix.org/legacy/publications/library/proceedings/dsl97/full_papers/wang/wang.pdf) + +### Values + +A `Value` is one of: + + - a `ByteString` for general-purpose non-numeric atomic data,[^byte-string-rationale] + - a `Number` for integers and rational numbers, + - a `List` for general-purpose variable-length sequences, + - a `Map` for general-purpose variable-size key/value maps, or + - a `Record` for tagging a tuple of values with an intended interpretation. + +We define a total order over `Value`s: Every `ByteString` is less than +the other kinds of `Value`; every `Number` is less than any `List`, +`Map` or `Record`, but greater than any `ByteString`; and so on. + +That is, `ByteString < Number < List < Map < Record`. + +Two values of the same kind are compared using kind-specific rules, +given below. + +Two `Value`s are equal if neither is less than the other according to +the total order. + + [^byte-string-rationale]: Why include `ByteString`, when we could + instead use a reserved `Record` along with a `List` of `Number`s? + ((TODO: Actually decide about this! Similarly, why include `Map` + rather than a restricted form of `List` with a `Record`? I think + the answer has to do with the arbitrariness of the label we'd + pick: unless *extremely* carefully chosen (i.e. number 0 (ideally + even `-Inf`!) for byte strings, number 1 for map, and have the + order go `Number < Record < List`), they would mess up the + prettiness of the ordering. Though... we could ultimately reduce + this to `Number` and `Record`, and have a family of `#"list"` and + `#"map"` `Record`s...)) + +### Byte strings + +A `ByteString` is an ordered sequence of zero or more integers in the +inclusive range [0..255]. + +`ByteString`s are compared lexicographically. + +We will write examples of `ByteString`s that contain only ASCII +characters using “`#"`” as an opening quote mark and “`"`” as a +closing quote mark. + +**Examples.** The `ByteString` containing the three ASCII characters +`A`, `B` and `C` is written as `#"ABC"`. The empty `ByteString` is +written as `#""`. **N.B.** Despite appearances, these are *binary* +data. + +### Numbers + +A `Number` is a signed rational number of finite precision whose +magnitude can be exactly represented in base two with a finite number +of digits. This includes integers of arbitrary width as well as (for +example) the non-infinite non-NaN IEEE 754 floating-point values. + +`Number`s are compared as mathematical numbers. + +We will write examples of `Number`s using standard mathematical +notation. + +**Examples.** 10, -6, 0.5, -3/2, 33/192, -1.202E4567. + +**Non-examples.** NaN (the clue is in the name!), ∞ (not finite), +0.2 (cannot be exactly represented with a finite number of binary +digits), 1/7 (likewise), 2+*i*3 (not rational), √2 (likewise). + +### Lists + +A `List` is an ordered sequence of zero or more `Value`s. + +`List`s are compared lexicographically, appealing to the ordering on +`Value`s for comparisons at each position in the `List`s. + +### Maps + +A `Map` is an *unordered* collection of zero or more pairs of +`Value`s. Each pair comprises a *key* and a *value*. Keys in a `Map` +must be pairwise distinct. + +Instances of `Map` are compared by lexicographic comparison of the +sequences resulting from ordering each `Map`'s pairs in ascending +order by key. ((TODO: Is this a good idea? Is it clearly-enough +written? An alternative approach is to compare first by the *count* of +pairs, and only if the count is the same, start comparing the pairs +themselves.)) + +### Records + +A `Record` is a tuple of one or more `Value`s. The first in the tuple +is called the *label* of the `Record`, and the other elements of the +tuple are called its *fields*. + +`Record` labels are *usually* `ByteString`s, but can be any kind of +`Value`.[^iri-labels] + + [^iri-labels]: It is occasionally (but seldom) necessary to + interpret such `ByteString` labels as UTF-8 encoded IRIs. Where a + label can be read as a relative IRI, it is notionally interpreted + with respect to the IRI `http://spki-cat.org/` ((TODO: + placeholder)); where a label can be read as an absolute IRI, it + stands for that IRI; and otherwise, it cannot be read as an IRI at + all, and so the label simply stands for itself - for its own + `Value`. + +`Record`s are compared lexicographically as if they were just tuples; +that is, first by their labels, and then by the remainder of their +fields. + +We will write examples of `Record`s with `ByteString` labels entirely +composed of ASCII characters as their label followed by their +parenthesised, comma-separated fields. + +**Examples.** The `Record` with label `#"foo"` and fields 1, 2 and 3 +is written `#"foo"(1, 2, 3)`; the `Record` with label `#"void"` and no +fields is written `#"void"()`. + +## Conventions for Common Data Types + +The `Value` data type is essentially an abstract S-Expression, able to +represent semi-structured data over `ByteString` and `Number` atoms. + +However, users need a wide variety of data types for representing +domain-specific values such as text, calendrical values, machine +words, IEEE 754 floating-point values, booleans, and so on. + +We use appropriately-labelled `Record`s to denote these +domain-specific data types. + +All of these conventions are optional. They form a layer atop the core +`Value` structure. Non-domain-specific tools do not in general need to +treat them specially. + +**Validity.** Many of the labels we will describe in this section come + with side-conditions on the contents of labelled `Record`s. It is + possible to construct an instance of `Value` that violates these + side-conditions without ceasing to be a `Value` or becoming + unrepresentable. However, we say that such a `Value` is *invalid* + because it fails to honour the necessary side-conditions. + Implementations *SHOULD* allow two modes of working: one which + treats all `Value`s identically, without regard for side-conditions, + and one which enforces validity (i.e. side-conditions) when reading, + writing, or constructing `Value`s. + +### Text + +A `Text` is a `Record` labelled with the `ByteString` `#"utf-8"` and +having a single field that is also a `ByteString`. The field *MUST* be +valid UTF-8. + +We will write examples of `Text`s that contain Unicode text using +“`"`” as both an opening and closing quote mark. + +**Examples.** The `Text` containing the three Unicode code points `z` +(0x7A), `水` (0x6C34) and `𝄞` (0x1D11E) is written as `"z水𝄞"`. + +**Normalization forms.** Unicode defines multiple +[normalization forms](http://unicode.org/reports/tr15/) for text. The +ordering and equivalence relations defined for `Value`s mean that, for +Unicode text, the UTF-8 encoded byte-level form of a text is used in +comparisons.[^utf8-is-awesome] In order for users to unambiguously +signal or require a particular normalization form, we define a +`NormalizedText`, which is a `Record` labelled with +`#"unicode-normalization"` and having two fields, the first of which +is a `Text` specifying the normalization form used (e.g. `"nfc"`, +`"nfd"`, `"nfkc"`, `"nfkd"`), and the second of which is a `Text` +whose underlying representation *MUST* be normalized according to the +named normalization form. + + [^utf8-is-awesome]: Happily, the design of UTF-8 is such that this + gives the same result as a lexicographic code-point-by-code-point + comparison! + +**IRIs.** (URIs, URLs, URNs, etc.) An `IRI` is a `Record` labelled +with `#"iri"` and having one field, a `Text` which is the IRI itself +and which *MUST* be a valid absolute or relative IRI. + +**Symbols.** Programming languages like Lisp and Prolog frequently use +string-like values called *symbols*. A `Symbol` is a `Record` labelled +with `#"symbol"` and having one field, a `Text`. + +### Numbers + +The definition of `Number` captures all integers and all +finitely-representable floating-point values. However, in certain +circumstances it can be valuable to assert that a number inhabits a +particular range, such as a fixed-width machine word or an IEEE 754 +floating-point value. + +**Fixed-width machine words.** (16-, 32- and 64-bit) A family of +labels `i`*n* and `u`*n* for *n* ∈ {16,32,64} denote *n*-bit-wide +signed and unsigned range restrictions, respectively. Records with +these labels *MUST* have one field, a `Number`, which *MUST* fall +within the appropriate range. That is, to be valid, + - in `#"i16"(`*x*`)`, -32768 <= *x* <= 32767, and ⌊*x*⌋ = *x*. + - in `#"u16"(`*x*`)`, 0 <= *x* <= 65535, and ⌊*x*⌋ = *x*. + - in `#"i32"(`*x*`)`, -2147483648 <= *x* <= 2147483647, and ⌊*x*⌋ = *x*. + - etc. + +**IEEE 754 floating-point.** (single- and double-precision) The labels +`f32` and `f64` denote single- and double-precision IEEE 754 +floating-point values, respectively. Records with these labels *MUST* +have one field. This field *MUST* either be a `Number`, which *MUST* +fall within the appropriate representable range, or one of the records +`#"nan"()`, `#"+inf"()` or `#"-inf"()`. + +### Anonymous Tuples and Unit + +A `Tuple` is a `Record` with label `#"tuple"` and zero or more fields, +denoting an anonymous tuple of values. + +The 0-ary tuple, `#"tuple"()`, denotes the empty tuple, sometimes +called "unit" or "void" (but *not* e.g. JavaScript's "undefined" +value). + +### Booleans, Null and Undefined + +The two 0-ary `Record`s `#"true"()` and `#"false"()` denote the "true" +and "false" Boolean values, respectively. + +Tony Hoare's +"[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)" +can be represented with the 0-ary `Record` `#"null"()`. An "undefined" +value can be represented as `#"undefined"()`. + +### Dates and Times + +Dates, times, moments, and timestamps can be represented with a +`Record` with label `#"rfc3339"` having a single field, a `Text`, +which *MUST* conform to one of the `full-date`, `partial-time`, +`full-time`, or `date-time` productions of +[section 5.6 of RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6). + +## Syntax + +Now we have discussed `Value`s and their meanings, we may turn to +techniques for *representing* `Value`s for communication or storage. + +The syntax we have used for the examples so far is inadequate in many +ways, not least of which is that it cannot represent every `Value`. + +Separation of the meaning of a piece of syntax from the syntax itself +opens the door to domain-specific syntaxes, all equivalent and +interconvertible.[^asn1] With a robust semantic foundation, +connections to other data languages can also be made. + + [^asn1]: Those who remember + [ASN.1](https://www.itu.int/en/ITU-T/asn1/Pages/introduction.aspx) + will recall BER, DER, PER, CER, XER and so on, each appropriate to + a different setting. Similarly, + [Rivest's S-Expression design][sexp.txt] offers a human-friendly + syntax, a syntax robust to network-induced message corruption, and + an unambiguous, simple and easily-parsed machine-friendly syntax + for the same underlying values. + +### Transfer syntax: S-Expressions + +For now, we limit our attention to an easily-parsed, easily-produced +machine-readable syntax by mapping our `Value`s to the canonical form +of [Rivest's S-Expressions][sexp.txt].[^why-not-spki-sexps] + + [^why-not-spki-sexps]: Why not just use Rivest's S-Expressions as + they are? While they include binary data and sequences, and an + obvious equivalence for them exists, they lack numbers *per se* as + well as any kind of unordered structure such as sets or maps. In + addition, while "display hints" allow labelling of binary data + with an intended interpretation, they cannot be attached to any + other kind of structure, and the "hint" itself can only be a + binary blob. + +#### Byte strings + +`ByteString`s map to byte-string S-Expressions. + +**Examples.** + - What we have been writing above as `#"ABC"` would be represented as + the S-Expression `3:ABC`. + - The empty `ByteString` is represented by the S-Expression `0:`. + +#### Numbers + +Numbers are the most complicated values to represent as an +S-Expression. + +((TODO: Consider cutting complexity by e.g. representing a `Number` as +a sign bit, a little-endian blob of the integer part of the number, +and a little-endian blob of the fractional part of the number. Lots of +trailing/leading zeros for very large/small numbers!)) + +We represent `Number`s using a sign-magnitude format, where the +magnitude is written using a little-endian, twos-complement binary +[*significand*](https://en.wikipedia.org/wiki/Significand) and a +(signed) *shift amount*. + +In essence, we use a generalized, variable-width form of binary IEEE +floating-point representation. + +Let `N` be the `Number` to represent as an S-Expression. + +The sign bit is 0 when `N` is zero or positive, and 1 when `N` is +negative. + +The magnitude of `N` can be viewed as an infinite sequence of bits +with a fraction-separator mark placed somewhere in the sequence, + +``` +···00.000 b_0 b_1 ··· b_{k-1} b_k ··· b_{n-1} 000000··· +···000000 b_0 b_1 ··· b_{k-1} . b_k ··· b_{n-1} 000000··· +···000000 b_0 b_1 ··· b_{k-1} b_k ··· b_{n-1} 000.00··· +``` + +where `b_0` is the leftmost (most significant) and `b_{n-1}` the +rightmost (least significant) non-zero bit. + +Let `k`, the position of the fraction-separator mark, be `i` when it +is immediately to the left of `b_i` for some `i`, generalizing to +negative values when it is to the left of `b_0` and values greater +than `n-1` when it is to the right of `b_{n-1}`. + +For example, `k` will be: + - 0 when the fraction-separator is immediately (i.e. zero bits) to the left of `b_0`; + - -3 (as in the first example above) when it is three bits left of `b_0`; + - `n` when it is immediately (i.e. zero bits) to the right of `b_{n-1}`; + - `n`+3 when it is three bits to the right of `b_{n-1}`. + +The unpadded significand is `b_0 b_1 ··· b_{n-1}`. + +When `k` < `n`, the shift `z`=`k-n` and the significand is: + - the unpadded significand, + - with the sign bit appended to it on the right, and then + - padded on the left with zeroes until it is a whole number of octets wide. + +When `k` ≥ `n`, the shift `z`=`8×⌊(k-n)/8⌋` and the significand is: + - the unpadded significand, + - padded on the right with `(k-n) mod 8` zeroes, + - with the sign bit then appended on the right, and then + - padded on the left with zeroes until it is a whole number of octets wide. + +Now, let `s`=2`z` if `z` is zero or positive, or `s`=2|`z`|+1 if `z` +is negative. + +Finally, the S-Expression form of `N` is: + - `(4:*num [SIGNIFICAND] [SHIFT])`, if `s`≠0; or + - `(4:*num [SIGNIFICAND])`, if `s`=0 but the significand contains non-zero bits; or + - `(4:*num)`, if `s`=0 and the significand contains no non-zero bits; + +where + - `[SIGNIFICAND]` stands for a byte-string S-Expression containing a little-endian representation of the significand, and + - `[SHIFT]` stands for a byte-string S-Expression containing a little-endian representation of `s`. + +**Examples.** (Shown using the hexadecimal representation of +byte-strings from +[section 4.4 of Rivest's S-Expression specification][sexp.txt] in +places.) + - `N`=0 → `(4:*num)` + - `N`=1 → `(4:*num#02#)` + - `N`=-1 → `(4:*num#03#)` + - `N`=10₁₀=1010.0₂ → `n`=3, `k`=4, `z`=0, `s`=0 → `(4:*num#14#)` + - `N`=2560₁₀=101000000000.0₂ → `n`=3, `k`=12, `z`=8, `s`=16 → `(4:*num#14##10#)` + - `N`=-2560₁₀=-101000000000.0₂ → `n`=3, `k`=12, `z`=8, `s`=16 → `(4:*num#15##10#)` + - `N`=-6₁₀=-110.0₂ → `n`=2, `k`=3, `z`=0, `s`=0 → `(4:*num#0D#)` + - `N`=0.5₁₀=0.1₂ → `n`=1, `k`=0, `z`=-1, `s`=3 → `(4:*num#02##03#)` + - `N`=-3/2₁₀=-1.1₂ → `n`=2, `k`=1, `z`=-1, `s`=3 → `(4:*num#07##03#)` + - `N`=33/192₁₀=0.001011₂ → `n`=4, `k`=-2, `z`=-6, `s`=7 → `(4:*num#16##07#)` + - `N`=-1.202E4567=1011011001···000₂ (15172 binary digits, the last 4565 of which are zero) → `n`=10607, `k`=15172, `z`=4560, `s`=9120 → `(4:*num#41828E···24CD16##A023#)` + +((TODO: figure out what this algorithm would actually look like in, +say, C, Python and Racket.)) + +#### Lists + +A `List` maps to an S-Expression list of representations of its +elements, with the byte-string S-Expression `5:*list` prepended. + +**Examples.** + - The `List` containing the `ByteString`s `#"a"`, `#"b"`, and `#"c"` + would be represented as the S-Expression `(5:*list1:a1:b1:c)`. + - The empty `List` is represented by the S-Expression `(5:*list)`. + +#### Maps + +A `Map` is represented by an S-Expression list of representations of +the `Map`'s key-value pairs, with the byte-string `4:*map` prepended. + +Each key-value pair is represented by a two-element S-Expression list +containing representations of the key and the value, in that order. + +The key-value pairs *MUST* be ordered by `Value`-order of their keys. + +**Examples.** + - The `Map` containing entries mapping `#"a"` to `#"d"` and `#"c"` to + `#"b"` is represented by `(4:*map(1:a1:d)(1:c1:b))`. + - The `Map` containing an entry mapping the empty list to a "true" + Boolean value is represented by `(4:*map((5:*list)(4:true)))`. + - The empty `Map` is represented by `(4:*map)`. + +**Non-examples.** + - The S-Expression `(4:*map(1:c1:b)(1:a1:d))` is invalid, because its + key-value pairs are not in `Value`-order by key: `#"c"` > `#"a"`. + - The S-Expression `(4:*map1:a1:d1:c1:b)` is invalid, because its + key-value pairs appear "flattened" in the outer list, rather than + each appearing in a two-element list of its own. + +#### Records + +A `Record` is represented by an S-Expression list of its fields, +prepended by: + + - the representation of its label, if its label is a `ByteString` and + does not begin with byte 42 (ASCII "`*`"); or + - the S-Expression `1:*` followed by the representation of the + `Record`'s label, otherwise. + +**Examples.** + - The `Text` `"hello-world"` is represented by the S-Expression + `(5:utf-811:hello-world)`. + - The `IRI` denoting `http://www.w3.org/` is represented by the + S-Expression `(3:iri(5:utf-818:http://www.w3.org/))`. + - The `Record` `#"*"()` is represented by the S-Expression + `(1:*1:*)`. + - The `Record` `#"*foo"(#"*bar")` is represented by the S-Expression + `(1:*4:*foo4:*bar)`. + - The `Record` with the empty list as its label and no fields is + represented by the S-Expression `(1:*(5:*list))`. + - `(7:rfc3339(5:utf-83:foo))` represents a well-formed `Value` that + is a `Record` with `#"rfc3339"` as its label, and a single `Text` + field. While it is a perfectly reasonable `Value`, it does *not* + represent a valid date or time, since the `Text` `"foo"` does not + conform to any of the RFC 3339 productions enumerated above. + +**Non-examples.** + - `((5:*list))` is not a representation of the `Record` with the + empty list as its label and no fields, because that `Record` has a + non-`ByteString` as its label, mandating a `1:*` prefix on its + S-Expression representation. + - `(4:*foo4:*bar)` does not represent the `Record` + `#"*foo"(#"*bar")`, because the label `#"*foo"` begins with "`*`", + mandating a `1:*` prefix on the `Record`'s S-Expression + representation. + +## Examples + +((TODO: Give some examples of large and small SPKI-CAT documents, +perhaps translated from various JSON blobs floating around the +internet.)) + +## Representing Values in Programming Languages + +We have given a definition of `Value` and its semantics, and proposed +a concrete syntax for communicating and storing `Value`s. We now turn +to **suggested** representations of `Value`s as *programming-language +values* for various programming languages. + +### JavaScript + + - `ByteString` ↔ `Uint8Array` + - `Number` ↔ numbers, problematically; bignums, perhaps; other?? TODO + - `List` ↔ `Array` + - `Map` ↔ `Object` + - `Record` ↔ an instance of something like `Record` below, unless the label is... + - `#"utf-8"` ↔ `String` + - `#"true"` ↔ `true` + - `#"false"` ↔ `false` + - `#"null"` ↔ `null` + - `#"undefined"` ↔ the undefined value + - `#"rfc3339"` ↔ `Date`, if the `Record`'s field matches the `date-time` RFC 3339 production + +```javascript +function Record(label, ...fields) { + this.label = label; + this.fields = fields; +} +``` + +### Scheme/Racket + + - `ByteString` ↔ byte vector (Racket: "Bytes") + - `Number` ↔ numbers + - `List` ↔ (where possible, immutable) list + - `Map` ↔ hash-table + - `Record` ↔ a structure (Racket: a "prefab struct"), unless the label is... + - `#"utf-8"` ↔ a string + - `#"true"` ↔ `#t` + - `#"false"` ↔ `#f` + - `#"symbol"` ↔ a symbol + +### Java + + - `ByteString` ↔ `byte[]` + - `Number` ↔ numbers, problematically; bignums, perhaps; other?? TODO + - `List` ↔ `java.util.List` + - `Map` ↔ `java.util.Map` + - `Record` ↔ an instance of something like `Record` below, unless the label is... + - `#"utf-8"` ↔ `java.lang.String` + - `#"true"` ↔ `java.lang.Boolean.TRUE` + - `#"false"` ↔ `java.lang.Boolean.FALSE` + - `#"null"` ↔ a special singleton object, but *not* Java's `null` + - `#"rfc3339"` ↔ `java.util.{Date,Time,Timestamp}`, according to which RFC 3339 production the `Record`'s field matches + +### Erlang + + - `ByteString` ↔ a binary + - `Number` ↔ numbers, probably; TODO + - `List` ↔ a list + - `Map` ↔ a [map](http://erlang.org/doc/reference_manual/data_types.html#id77432) (new in Erlang/OTP R17) + - `Record` ↔ a tuple with the label in the first position, and the fields in subsequent positions, unless the label is... + - `#"true"` ↔ `true` + - `#"false"` ↔ `false` + - `#"null"` ↔ `null` + - `#"undefined"` ↔ `undefined` + - `#"symbol"` ↔ the `Text` field converted to an Erlang atom, if + some kind of an "unsafe" mode is set on the decoder (because + Erlang atoms are not GC'd); otherwise like any other kind of + `Record` + +---