codec.md draft

This commit is contained in:
Tony Garnock-Jones 2018-06-05 10:32:00 +01:00
parent c23285781c
commit 738fad360a
1 changed files with 662 additions and 0 deletions

662
syndicate/codec.md Normal file
View File

@ -0,0 +1,662 @@
---
---
<style>
body { font-size: 120%; margin-left: 2rem; }
h1, h2, h3, h4, h5, h6 { margin-left: -1rem; }
h2 { border-bottom: solid black 1px; }
</style>
2018-06-04 20:31:48 TODO: cwebber's email comments
2018-06-04 20:31:51 TODO: look at https://github.com/imbal/rson and at clojure EDN
2018-06-05 10:32:02 ... and at http://json-schema.org/latest/json-schema-core.html#rfc.section.4.2
# SPKI CAT: SPKI S-Expressions with Canonical Atom Tags
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
Christopher Lemmer Webber <cwebber@dustycloud.org>
May 2018
Version 0.0.1
________________
/ \
/\__/\ / boo! \
/ \ \ i'm very spki! \
\=\_^__^_/= ___/\__________________/
|/ \
\\ | | /
<_|--|_>
## Introduction
This document proposes a language-neutral JSON-like *data type*, along
with a robust equivalence relation ("semantics") and a total ordering
over inhabitants of the type.[^tjson]
[^tjson]: [TJSON](https://www.tjson.org/) has a similar aim:
“different on-the-wire representations of an object correspond to
the same typed data object”
([source](https://news.ycombinator.com/item?id=12860143)); “TJSON
is defined as a serialization format on top of a JSON-like data
model” ([source](https://news.ycombinator.com/item?id=12860401)).
It then suggests conventions for encoding common data formats in terms
of the proposed data type.
Finally, it proposes concrete *syntax* for the data type, offering a
language-neutral transfer syntax (based on
[Rivest's S-Expressions][sexp.txt] as used in [SPKI/SDSI][spki]) and
suggesting possible language-specific representations for the data
type's inhabitants.
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
[spki]: http://world.std.com/~cme/html/spki.html
### Why not Just Use JSON?
<!-- JSON lacks semantics: JSON syntax doesn't denote anything -->
JSON offers *syntax* for numbers, strings, booleans, null, arrays and
string-keyed maps. However, it offers no *semantics* for the syntax:
it is left to each implementation to determine how to treat each JSON
term. This causes
[interoperability](http://seriot.ch/parsing_json.php) and even
[security](http://seriot.ch/parsing_json.php) issues.
Specifically, JSON does not:
- assign any meaning to numbers,[^meaning-ieee-double]
- determine how strings are to be compared,[^string-key-comparison]
- determine whether object key ordering is significant, or
- determine whether duplicate object keys are permitted, what it
would mean if they were, or how to determine a duplicate in the
first place.
In short, JSON syntax doesn't *denote* anything.[^xml-infoset] [^other-formats]
[^meaning-ieee-double]:
[Section 6 of RFC 7159](https://tools.ietf.org/html/rfc7159#section-6)
does go so far as to indicate “good interoperability can be
achieved” by imagining that parsers are able reliably to
understand the syntax of numbers as denoting an IEEE 754
double-precision floating-point value.
[^string-key-comparison]:
[Section 8.3 of RFC 7159](https://tools.ietf.org/html/rfc7159#section-8.3)
suggests that *if* an implementation compares strings used as
object keys “code unit by code unit”, then it will interoperate
with *other such implementations*, but neither requires this
behaviour nor discusses comparisons of strings used in other
contexts.
[^xml-infoset]: The XML world has the concept of
[XML infoset](https://www.w3.org/TR/xml-infoset/). Loosely
speaking, XML infoset is the *denotation* of an XML document; the
*meaning* of the document.
[^other-formats]: Most other recent data languages are like JSON in
specifying only a syntax with no associated semantics. While some
do make a sketch of a semantics, the result is often
underspecified (e.g. in terms of how strings are to be compared),
overly machine-oriented (e.g. treating 32-bit integers as
fundamentally distinct from 64-bit integers and from
floating-point numbers), overly fine (e.g. giving visibility to
the order in which map entries are written), or all three.
Some examples:
- are the JSON values `1`, `1.0`, and `1e0` the same or different?
- are the JSON values `1.0` and `1.0000000000000001` the same or different?
- are the JSON strings `"päron"` (UTF-8 `70c3a4726f6e`) and `"päron"`
(UTF-8 `7061cc88726f6e`) the same or different?
- are the JSON objects `{"a":1, "b":2}` and `{"b":2, "a":1}` the same
or different?
- which, if any, of `{"a":1, "a":2}`, `{"a":1}` and `{"a":2}` are the
same? Are all three legal?
- are `{"päron":1}` and `{"päron":1}` the same or different?
Different JSON implementations give different answers to these
questions. The JSON specifications are silent on these questions.
There are other minor problems with JSON having to do with its syntax.
Examples include its relative verbosity and its lack of support for
binary data.
## Starting with Semantics
Taking inspiration from functional programming, we start with a
definition of the *values* that we want to work with and give them
meaning independent of their syntax. We will treat syntax separately,
later in this document.
We will want our data type to accommodate *atoms* (numbers and text),
*products* (both tuples and sequences), and *labelled
sums*.[^zephyr-asdl] It should also include *keyed maps*. We should
avoid unnecessary restrictions such as machine-oriented fixed-width
integer or floating-point values where possible.
[^zephyr-asdl]: This design was loosely inspired by Zephyr ASDL (h/t
[Darius Bacon](https://twitter.com/abecedarius/status/993545767884226561)),
which doesn't offer much in the way of atoms, but offers
general-purpose labelled sums and products. See D. C. Wang, A. W.
Appel, J. L. Korn, and C. S. Serra, “The Zephyr Abstract Syntax
Description Language,” in USENIX Conference on Domain-Specific
Languages, 1997, pp. 213228.
[PDF available.](https://www.usenix.org/legacy/publications/library/proceedings/dsl97/full_papers/wang/wang.pdf)
### Values
A `Value` is one of:
- a `ByteString` for general-purpose non-numeric atomic data,[^byte-string-rationale]
- a `Number` for integers and rational numbers,
- a `List` for general-purpose variable-length sequences,
- a `Map` for general-purpose variable-size key/value maps, or
- a `Record` for tagging a tuple of values with an intended interpretation.
We define a total order over `Value`s: Every `ByteString` is less than
the other kinds of `Value`; every `Number` is less than any `List`,
`Map` or `Record`, but greater than any `ByteString`; and so on.
That is, `ByteString < Number < List < Map < Record`.
Two values of the same kind are compared using kind-specific rules,
given below.
Two `Value`s are equal if neither is less than the other according to
the total order.
[^byte-string-rationale]: Why include `ByteString`, when we could
instead use a reserved `Record` along with a `List` of `Number`s?
((TODO: Actually decide about this! Similarly, why include `Map`
rather than a restricted form of `List` with a `Record`? I think
the answer has to do with the arbitrariness of the label we'd
pick: unless *extremely* carefully chosen (i.e. number 0 (ideally
even `-Inf`!) for byte strings, number 1 for map, and have the
order go `Number < Record < List`), they would mess up the
prettiness of the ordering. Though... we could ultimately reduce
this to `Number` and `Record`, and have a family of `#"list"` and
`#"map"` `Record`s...))
### Byte strings
A `ByteString` is an ordered sequence of zero or more integers in the
inclusive range [0..255].
`ByteString`s are compared lexicographically.
We will write examples of `ByteString`s that contain only ASCII
characters using “`#"`” as an opening quote mark and “`"`” as a
closing quote mark.
**Examples.** The `ByteString` containing the three ASCII characters
`A`, `B` and `C` is written as `#"ABC"`. The empty `ByteString` is
written as `#""`. **N.B.** Despite appearances, these are *binary*
data.
### Numbers
A `Number` is a signed rational number of finite precision whose
magnitude can be exactly represented in base two with a finite number
of digits. This includes integers of arbitrary width as well as (for
example) the non-infinite non-NaN IEEE 754 floating-point values.
`Number`s are compared as mathematical numbers.
We will write examples of `Number`s using standard mathematical
notation.
**Examples.** 10, -6, 0.5, -3/2, 33/192, -1.202E4567.
**Non-examples.** NaN (the clue is in the name!), ∞ (not finite),
0.2 (cannot be exactly represented with a finite number of binary
digits), 1/7 (likewise), 2+*i*3 (not rational), √2 (likewise).
### Lists
A `List` is an ordered sequence of zero or more `Value`s.
`List`s are compared lexicographically, appealing to the ordering on
`Value`s for comparisons at each position in the `List`s.
### Maps
A `Map` is an *unordered* collection of zero or more pairs of
`Value`s. Each pair comprises a *key* and a *value*. Keys in a `Map`
must be pairwise distinct.
Instances of `Map` are compared by lexicographic comparison of the
sequences resulting from ordering each `Map`'s pairs in ascending
order by key. ((TODO: Is this a good idea? Is it clearly-enough
written? An alternative approach is to compare first by the *count* of
pairs, and only if the count is the same, start comparing the pairs
themselves.))
### Records
A `Record` is a tuple of one or more `Value`s. The first in the tuple
is called the *label* of the `Record`, and the other elements of the
tuple are called its *fields*.
`Record` labels are *usually* `ByteString`s, but can be any kind of
`Value`.[^iri-labels]
[^iri-labels]: It is occasionally (but seldom) necessary to
interpret such `ByteString` labels as UTF-8 encoded IRIs. Where a
label can be read as a relative IRI, it is notionally interpreted
with respect to the IRI `http://spki-cat.org/` ((TODO:
placeholder)); where a label can be read as an absolute IRI, it
stands for that IRI; and otherwise, it cannot be read as an IRI at
all, and so the label simply stands for itself - for its own
`Value`.
`Record`s are compared lexicographically as if they were just tuples;
that is, first by their labels, and then by the remainder of their
fields.
We will write examples of `Record`s with `ByteString` labels entirely
composed of ASCII characters as their label followed by their
parenthesised, comma-separated fields.
**Examples.** The `Record` with label `#"foo"` and fields 1, 2 and 3
is written `#"foo"(1, 2, 3)`; the `Record` with label `#"void"` and no
fields is written `#"void"()`.
## Conventions for Common Data Types
The `Value` data type is essentially an abstract S-Expression, able to
represent semi-structured data over `ByteString` and `Number` atoms.
However, users need a wide variety of data types for representing
domain-specific values such as text, calendrical values, machine
words, IEEE 754 floating-point values, booleans, and so on.
We use appropriately-labelled `Record`s to denote these
domain-specific data types.
All of these conventions are optional. They form a layer atop the core
`Value` structure. Non-domain-specific tools do not in general need to
treat them specially.
**Validity.** Many of the labels we will describe in this section come
with side-conditions on the contents of labelled `Record`s. It is
possible to construct an instance of `Value` that violates these
side-conditions without ceasing to be a `Value` or becoming
unrepresentable. However, we say that such a `Value` is *invalid*
because it fails to honour the necessary side-conditions.
Implementations *SHOULD* allow two modes of working: one which
treats all `Value`s identically, without regard for side-conditions,
and one which enforces validity (i.e. side-conditions) when reading,
writing, or constructing `Value`s.
### Text
A `Text` is a `Record` labelled with the `ByteString` `#"utf-8"` and
having a single field that is also a `ByteString`. The field *MUST* be
valid UTF-8.
We will write examples of `Text`s that contain Unicode text using
“`"`” as both an opening and closing quote mark.
**Examples.** The `Text` containing the three Unicode code points `z`
(0x7A), `水` (0x6C34) and `𝄞` (0x1D11E) is written as `"z水𝄞"`.
**Normalization forms.** Unicode defines multiple
[normalization forms](http://unicode.org/reports/tr15/) for text. The
ordering and equivalence relations defined for `Value`s mean that, for
Unicode text, the UTF-8 encoded byte-level form of a text is used in
comparisons.[^utf8-is-awesome] In order for users to unambiguously
signal or require a particular normalization form, we define a
`NormalizedText`, which is a `Record` labelled with
`#"unicode-normalization"` and having two fields, the first of which
is a `Text` specifying the normalization form used (e.g. `"nfc"`,
`"nfd"`, `"nfkc"`, `"nfkd"`), and the second of which is a `Text`
whose underlying representation *MUST* be normalized according to the
named normalization form.
[^utf8-is-awesome]: Happily, the design of UTF-8 is such that this
gives the same result as a lexicographic code-point-by-code-point
comparison!
**IRIs.** (URIs, URLs, URNs, etc.) An `IRI` is a `Record` labelled
with `#"iri"` and having one field, a `Text` which is the IRI itself
and which *MUST* be a valid absolute or relative IRI.
**Symbols.** Programming languages like Lisp and Prolog frequently use
string-like values called *symbols*. A `Symbol` is a `Record` labelled
with `#"symbol"` and having one field, a `Text`.
### Numbers
The definition of `Number` captures all integers and all
finitely-representable floating-point values. However, in certain
circumstances it can be valuable to assert that a number inhabits a
particular range, such as a fixed-width machine word or an IEEE 754
floating-point value.
**Fixed-width machine words.** (16-, 32- and 64-bit) A family of
labels `i`*n* and `u`*n* for *n* ∈ {16,32,64} denote *n*-bit-wide
signed and unsigned range restrictions, respectively. Records with
these labels *MUST* have one field, a `Number`, which *MUST* fall
within the appropriate range. That is, to be valid,
- in `#"i16"(`*x*`)`, -32768 <= *x* <= 32767, and ⌊*x*⌋ = *x*.
- in `#"u16"(`*x*`)`, 0 <= *x* <= 65535, and ⌊*x*⌋ = *x*.
- in `#"i32"(`*x*`)`, -2147483648 <= *x* <= 2147483647, and ⌊*x*⌋ = *x*.
- etc.
**IEEE 754 floating-point.** (single- and double-precision) The labels
`f32` and `f64` denote single- and double-precision IEEE 754
floating-point values, respectively. Records with these labels *MUST*
have one field. This field *MUST* either be a `Number`, which *MUST*
fall within the appropriate representable range, or one of the records
`#"nan"()`, `#"+inf"()` or `#"-inf"()`.
### Anonymous Tuples and Unit
A `Tuple` is a `Record` with label `#"tuple"` and zero or more fields,
denoting an anonymous tuple of values.
The 0-ary tuple, `#"tuple"()`, denotes the empty tuple, sometimes
called "unit" or "void" (but *not* e.g. JavaScript's "undefined"
value).
### Booleans, Null and Undefined
The two 0-ary `Record`s `#"true"()` and `#"false"()` denote the "true"
and "false" Boolean values, respectively.
Tony Hoare's
"[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)"
can be represented with the 0-ary `Record` `#"null"()`. An "undefined"
value can be represented as `#"undefined"()`.
### Dates and Times
Dates, times, moments, and timestamps can be represented with a
`Record` with label `#"rfc3339"` having a single field, a `Text`,
which *MUST* conform to one of the `full-date`, `partial-time`,
`full-time`, or `date-time` productions of
[section 5.6 of RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6).
## Syntax
Now we have discussed `Value`s and their meanings, we may turn to
techniques for *representing* `Value`s for communication or storage.
The syntax we have used for the examples so far is inadequate in many
ways, not least of which is that it cannot represent every `Value`.
Separation of the meaning of a piece of syntax from the syntax itself
opens the door to domain-specific syntaxes, all equivalent and
interconvertible.[^asn1] With a robust semantic foundation,
connections to other data languages can also be made.
[^asn1]: Those who remember
[ASN.1](https://www.itu.int/en/ITU-T/asn1/Pages/introduction.aspx)
will recall BER, DER, PER, CER, XER and so on, each appropriate to
a different setting. Similarly,
[Rivest's S-Expression design][sexp.txt] offers a human-friendly
syntax, a syntax robust to network-induced message corruption, and
an unambiguous, simple and easily-parsed machine-friendly syntax
for the same underlying values.
### Transfer syntax: S-Expressions
For now, we limit our attention to an easily-parsed, easily-produced
machine-readable syntax by mapping our `Value`s to the canonical form
of [Rivest's S-Expressions][sexp.txt].[^why-not-spki-sexps]
[^why-not-spki-sexps]: Why not just use Rivest's S-Expressions as
they are? While they include binary data and sequences, and an
obvious equivalence for them exists, they lack numbers *per se* as
well as any kind of unordered structure such as sets or maps. In
addition, while "display hints" allow labelling of binary data
with an intended interpretation, they cannot be attached to any
other kind of structure, and the "hint" itself can only be a
binary blob.
#### Byte strings
`ByteString`s map to byte-string S-Expressions.
**Examples.**
- What we have been writing above as `#"ABC"` would be represented as
the S-Expression `3:ABC`.
- The empty `ByteString` is represented by the S-Expression `0:`.
#### Numbers
Numbers are the most complicated values to represent as an
S-Expression.
((TODO: Consider cutting complexity by e.g. representing a `Number` as
a sign bit, a little-endian blob of the integer part of the number,
and a little-endian blob of the fractional part of the number. Lots of
trailing/leading zeros for very large/small numbers!))
We represent `Number`s using a sign-magnitude format, where the
magnitude is written using a little-endian, twos-complement binary
[*significand*](https://en.wikipedia.org/wiki/Significand) and a
(signed) *shift amount*.
In essence, we use a generalized, variable-width form of binary IEEE
floating-point representation.
Let `N` be the `Number` to represent as an S-Expression.
The sign bit is 0 when `N` is zero or positive, and 1 when `N` is
negative.
The magnitude of `N` can be viewed as an infinite sequence of bits
with a fraction-separator mark placed somewhere in the sequence,
```
···00.000 b_0 b_1 ··· b_{k-1} b_k ··· b_{n-1} 000000···
···000000 b_0 b_1 ··· b_{k-1} . b_k ··· b_{n-1} 000000···
···000000 b_0 b_1 ··· b_{k-1} b_k ··· b_{n-1} 000.00···
```
where `b_0` is the leftmost (most significant) and `b_{n-1}` the
rightmost (least significant) non-zero bit.
Let `k`, the position of the fraction-separator mark, be `i` when it
is immediately to the left of `b_i` for some `i`, generalizing to
negative values when it is to the left of `b_0` and values greater
than `n-1` when it is to the right of `b_{n-1}`.
For example, `k` will be:
- 0 when the fraction-separator is immediately (i.e. zero bits) to the left of `b_0`;
- -3 (as in the first example above) when it is three bits left of `b_0`;
- `n` when it is immediately (i.e. zero bits) to the right of `b_{n-1}`;
- `n`+3 when it is three bits to the right of `b_{n-1}`.
The unpadded significand is `b_0 b_1 ··· b_{n-1}`.
When `k` < `n`, the shift `z`=`k-n` and the significand is:
- the unpadded significand,
- with the sign bit appended to it on the right, and then
- padded on the left with zeroes until it is a whole number of octets wide.
When `k``n`, the shift `z`=`8×⌊(k-n)/8⌋` and the significand is:
- the unpadded significand,
- padded on the right with `(k-n) mod 8` zeroes,
- with the sign bit then appended on the right, and then
- padded on the left with zeroes until it is a whole number of octets wide.
Now, let `s`=2`z` if `z` is zero or positive, or `s`=2|`z`|+1 if `z`
is negative.
Finally, the S-Expression form of `N` is:
- `(4:*num [SIGNIFICAND] [SHIFT])`, if `s`≠0; or
- `(4:*num [SIGNIFICAND])`, if `s`=0 but the significand contains non-zero bits; or
- `(4:*num)`, if `s`=0 and the significand contains no non-zero bits;
where
- `[SIGNIFICAND]` stands for a byte-string S-Expression containing a little-endian representation of the significand, and
- `[SHIFT]` stands for a byte-string S-Expression containing a little-endian representation of `s`.
**Examples.** (Shown using the hexadecimal representation of
byte-strings from
[section 4.4 of Rivest's S-Expression specification][sexp.txt] in
places.)
- `N`=0 → `(4:*num)`
- `N`=1 → `(4:*num#02#)`
- `N`=-1 → `(4:*num#03#)`
- `N`=10₁₀=1010.0₂ → `n`=3, `k`=4, `z`=0, `s`=0 → `(4:*num#14#)`
- `N`=2560₁₀=101000000000.0₂ → `n`=3, `k`=12, `z`=8, `s`=16 → `(4:*num#14##10#)`
- `N`=-2560₁₀=-101000000000.0₂ → `n`=3, `k`=12, `z`=8, `s`=16 → `(4:*num#15##10#)`
- `N`=-6₁₀=-110.0₂ → `n`=2, `k`=3, `z`=0, `s`=0 → `(4:*num#0D#)`
- `N`=0.5₁₀=0.1₂ → `n`=1, `k`=0, `z`=-1, `s`=3 → `(4:*num#02##03#)`
- `N`=-3/2₁₀=-1.1₂ → `n`=2, `k`=1, `z`=-1, `s`=3 → `(4:*num#07##03#)`
- `N`=33/192₁₀=0.001011₂ → `n`=4, `k`=-2, `z`=-6, `s`=7 → `(4:*num#16##07#)`
- `N`=-1.202E4567=1011011001···000₂ (15172 binary digits, the last 4565 of which are zero) → `n`=10607, `k`=15172, `z`=4560, `s`=9120 → `(4:*num#41828E···24CD16##A023#)`
((TODO: figure out what this algorithm would actually look like in,
say, C, Python and Racket.))
#### Lists
A `List` maps to an S-Expression list of representations of its
elements, with the byte-string S-Expression `5:*list` prepended.
**Examples.**
- The `List` containing the `ByteString`s `#"a"`, `#"b"`, and `#"c"`
would be represented as the S-Expression `(5:*list1:a1:b1:c)`.
- The empty `List` is represented by the S-Expression `(5:*list)`.
#### Maps
A `Map` is represented by an S-Expression list of representations of
the `Map`'s key-value pairs, with the byte-string `4:*map` prepended.
Each key-value pair is represented by a two-element S-Expression list
containing representations of the key and the value, in that order.
The key-value pairs *MUST* be ordered by `Value`-order of their keys.
**Examples.**
- The `Map` containing entries mapping `#"a"` to `#"d"` and `#"c"` to
`#"b"` is represented by `(4:*map(1:a1:d)(1:c1:b))`.
- The `Map` containing an entry mapping the empty list to a "true"
Boolean value is represented by `(4:*map((5:*list)(4:true)))`.
- The empty `Map` is represented by `(4:*map)`.
**Non-examples.**
- The S-Expression `(4:*map(1:c1:b)(1:a1:d))` is invalid, because its
key-value pairs are not in `Value`-order by key: `#"c"` > `#"a"`.
- The S-Expression `(4:*map1:a1:d1:c1:b)` is invalid, because its
key-value pairs appear "flattened" in the outer list, rather than
each appearing in a two-element list of its own.
#### Records
A `Record` is represented by an S-Expression list of its fields,
prepended by:
- the representation of its label, if its label is a `ByteString` and
does not begin with byte 42 (ASCII "`*`"); or
- the S-Expression `1:*` followed by the representation of the
`Record`'s label, otherwise.
**Examples.**
- The `Text` `"hello-world"` is represented by the S-Expression
`(5:utf-811:hello-world)`.
- The `IRI` denoting `http://www.w3.org/` is represented by the
S-Expression `(3:iri(5:utf-818:http://www.w3.org/))`.
- The `Record` `#"*"()` is represented by the S-Expression
`(1:*1:*)`.
- The `Record` `#"*foo"(#"*bar")` is represented by the S-Expression
`(1:*4:*foo4:*bar)`.
- The `Record` with the empty list as its label and no fields is
represented by the S-Expression `(1:*(5:*list))`.
- `(7:rfc3339(5:utf-83:foo))` represents a well-formed `Value` that
is a `Record` with `#"rfc3339"` as its label, and a single `Text`
field. While it is a perfectly reasonable `Value`, it does *not*
represent a valid date or time, since the `Text` `"foo"` does not
conform to any of the RFC 3339 productions enumerated above.
**Non-examples.**
- `((5:*list))` is not a representation of the `Record` with the
empty list as its label and no fields, because that `Record` has a
non-`ByteString` as its label, mandating a `1:*` prefix on its
S-Expression representation.
- `(4:*foo4:*bar)` does not represent the `Record`
`#"*foo"(#"*bar")`, because the label `#"*foo"` begins with "`*`",
mandating a `1:*` prefix on the `Record`'s S-Expression
representation.
## Examples
((TODO: Give some examples of large and small SPKI-CAT documents,
perhaps translated from various JSON blobs floating around the
internet.))
## Representing Values in Programming Languages
We have given a definition of `Value` and its semantics, and proposed
a concrete syntax for communicating and storing `Value`s. We now turn
to **suggested** representations of `Value`s as *programming-language
values* for various programming languages.
### JavaScript
- `ByteString``Uint8Array`
- `Number` ↔ numbers, problematically; bignums, perhaps; other?? TODO
- `List``Array`
- `Map``Object`
- `Record` ↔ an instance of something like `Record` below, unless the label is...
- `#"utf-8"``String`
- `#"true"``true`
- `#"false"``false`
- `#"null"``null`
- `#"undefined"` ↔ the undefined value
- `#"rfc3339"``Date`, if the `Record`'s field matches the `date-time` RFC 3339 production
```javascript
function Record(label, ...fields) {
this.label = label;
this.fields = fields;
}
```
### Scheme/Racket
- `ByteString` ↔ byte vector (Racket: "Bytes")
- `Number` ↔ numbers
- `List` ↔ (where possible, immutable) list
- `Map` ↔ hash-table
- `Record` ↔ a structure (Racket: a "prefab struct"), unless the label is...
- `#"utf-8"` ↔ a string
- `#"true"``#t`
- `#"false"``#f`
- `#"symbol"` ↔ a symbol
### Java
- `ByteString``byte[]`
- `Number` ↔ numbers, problematically; bignums, perhaps; other?? TODO
- `List``java.util.List`
- `Map``java.util.Map`
- `Record` ↔ an instance of something like `Record` below, unless the label is...
- `#"utf-8"``java.lang.String`
- `#"true"``java.lang.Boolean.TRUE`
- `#"false"``java.lang.Boolean.FALSE`
- `#"null"` ↔ a special singleton object, but *not* Java's `null`
- `#"rfc3339"``java.util.{Date,Time,Timestamp}`, according to which RFC 3339 production the `Record`'s field matches
### Erlang
- `ByteString` ↔ a binary
- `Number` ↔ numbers, probably; TODO
- `List` ↔ a list
- `Map` ↔ a [map](http://erlang.org/doc/reference_manual/data_types.html#id77432) (new in Erlang/OTP R17)
- `Record` ↔ a tuple with the label in the first position, and the fields in subsequent positions, unless the label is...
- `#"true"``true`
- `#"false"``false`
- `#"null"``null`
- `#"undefined"``undefined`
- `#"symbol"` ↔ the `Text` field converted to an Erlang atom, if
some kind of an "unsafe" mode is set on the decoder (because
Erlang atoms are not GC'd); otherwise like any other kind of
`Record`
---