forked from syndicate-lang/preserves
663 lines
27 KiB
Markdown
663 lines
27 KiB
Markdown
|
---
|
|||
|
---
|
|||
|
<style>
|
|||
|
body { font-size: 120%; margin-left: 2rem; }
|
|||
|
h1, h2, h3, h4, h5, h6 { margin-left: -1rem; }
|
|||
|
h2 { border-bottom: solid black 1px; }
|
|||
|
</style>
|
|||
|
|
|||
|
2018-06-04 20:31:48 TODO: cwebber's email comments
|
|||
|
2018-06-04 20:31:51 TODO: look at https://github.com/imbal/rson and at clojure EDN
|
|||
|
2018-06-05 10:32:02 ... and at http://json-schema.org/latest/json-schema-core.html#rfc.section.4.2
|
|||
|
|
|||
|
# SPKI CAT: SPKI S-Expressions with Canonical Atom Tags
|
|||
|
|
|||
|
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
|||
|
Christopher Lemmer Webber <cwebber@dustycloud.org>
|
|||
|
May 2018
|
|||
|
Version 0.0.1
|
|||
|
|
|||
|
________________
|
|||
|
/ \
|
|||
|
/\__/\ / boo! \
|
|||
|
/ \ \ i'm very spki! \
|
|||
|
\=\_^__^_/= ___/\__________________/
|
|||
|
|/ \
|
|||
|
\\ | | /
|
|||
|
<_|--|_>
|
|||
|
|
|||
|
## Introduction
|
|||
|
|
|||
|
This document proposes a language-neutral JSON-like *data type*, along
|
|||
|
with a robust equivalence relation ("semantics") and a total ordering
|
|||
|
over inhabitants of the type.[^tjson]
|
|||
|
|
|||
|
[^tjson]: [TJSON](https://www.tjson.org/) has a similar aim:
|
|||
|
“different on-the-wire representations of an object correspond to
|
|||
|
the same typed data object”
|
|||
|
([source](https://news.ycombinator.com/item?id=12860143)); “TJSON
|
|||
|
is defined as a serialization format on top of a JSON-like data
|
|||
|
model” ([source](https://news.ycombinator.com/item?id=12860401)).
|
|||
|
|
|||
|
It then suggests conventions for encoding common data formats in terms
|
|||
|
of the proposed data type.
|
|||
|
|
|||
|
Finally, it proposes concrete *syntax* for the data type, offering a
|
|||
|
language-neutral transfer syntax (based on
|
|||
|
[Rivest's S-Expressions][sexp.txt] as used in [SPKI/SDSI][spki]) and
|
|||
|
suggesting possible language-specific representations for the data
|
|||
|
type's inhabitants.
|
|||
|
|
|||
|
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
|
|||
|
[spki]: http://world.std.com/~cme/html/spki.html
|
|||
|
|
|||
|
### Why not Just Use JSON?
|
|||
|
|
|||
|
<!-- JSON lacks semantics: JSON syntax doesn't denote anything -->
|
|||
|
|
|||
|
JSON offers *syntax* for numbers, strings, booleans, null, arrays and
|
|||
|
string-keyed maps. However, it offers no *semantics* for the syntax:
|
|||
|
it is left to each implementation to determine how to treat each JSON
|
|||
|
term. This causes
|
|||
|
[interoperability](http://seriot.ch/parsing_json.php) and even
|
|||
|
[security](http://seriot.ch/parsing_json.php) issues.
|
|||
|
|
|||
|
Specifically, JSON does not:
|
|||
|
|
|||
|
- assign any meaning to numbers,[^meaning-ieee-double]
|
|||
|
- determine how strings are to be compared,[^string-key-comparison]
|
|||
|
- determine whether object key ordering is significant, or
|
|||
|
- determine whether duplicate object keys are permitted, what it
|
|||
|
would mean if they were, or how to determine a duplicate in the
|
|||
|
first place.
|
|||
|
|
|||
|
In short, JSON syntax doesn't *denote* anything.[^xml-infoset] [^other-formats]
|
|||
|
|
|||
|
[^meaning-ieee-double]:
|
|||
|
[Section 6 of RFC 7159](https://tools.ietf.org/html/rfc7159#section-6)
|
|||
|
does go so far as to indicate “good interoperability can be
|
|||
|
achieved” by imagining that parsers are able reliably to
|
|||
|
understand the syntax of numbers as denoting an IEEE 754
|
|||
|
double-precision floating-point value.
|
|||
|
|
|||
|
[^string-key-comparison]:
|
|||
|
[Section 8.3 of RFC 7159](https://tools.ietf.org/html/rfc7159#section-8.3)
|
|||
|
suggests that *if* an implementation compares strings used as
|
|||
|
object keys “code unit by code unit”, then it will interoperate
|
|||
|
with *other such implementations*, but neither requires this
|
|||
|
behaviour nor discusses comparisons of strings used in other
|
|||
|
contexts.
|
|||
|
|
|||
|
[^xml-infoset]: The XML world has the concept of
|
|||
|
[XML infoset](https://www.w3.org/TR/xml-infoset/). Loosely
|
|||
|
speaking, XML infoset is the *denotation* of an XML document; the
|
|||
|
*meaning* of the document.
|
|||
|
|
|||
|
[^other-formats]: Most other recent data languages are like JSON in
|
|||
|
specifying only a syntax with no associated semantics. While some
|
|||
|
do make a sketch of a semantics, the result is often
|
|||
|
underspecified (e.g. in terms of how strings are to be compared),
|
|||
|
overly machine-oriented (e.g. treating 32-bit integers as
|
|||
|
fundamentally distinct from 64-bit integers and from
|
|||
|
floating-point numbers), overly fine (e.g. giving visibility to
|
|||
|
the order in which map entries are written), or all three.
|
|||
|
|
|||
|
Some examples:
|
|||
|
|
|||
|
- are the JSON values `1`, `1.0`, and `1e0` the same or different?
|
|||
|
- are the JSON values `1.0` and `1.0000000000000001` the same or different?
|
|||
|
- are the JSON strings `"päron"` (UTF-8 `70c3a4726f6e`) and `"päron"`
|
|||
|
(UTF-8 `7061cc88726f6e`) the same or different?
|
|||
|
- are the JSON objects `{"a":1, "b":2}` and `{"b":2, "a":1}` the same
|
|||
|
or different?
|
|||
|
- which, if any, of `{"a":1, "a":2}`, `{"a":1}` and `{"a":2}` are the
|
|||
|
same? Are all three legal?
|
|||
|
- are `{"päron":1}` and `{"päron":1}` the same or different?
|
|||
|
|
|||
|
Different JSON implementations give different answers to these
|
|||
|
questions. The JSON specifications are silent on these questions.
|
|||
|
|
|||
|
There are other minor problems with JSON having to do with its syntax.
|
|||
|
Examples include its relative verbosity and its lack of support for
|
|||
|
binary data.
|
|||
|
|
|||
|
## Starting with Semantics
|
|||
|
|
|||
|
Taking inspiration from functional programming, we start with a
|
|||
|
definition of the *values* that we want to work with and give them
|
|||
|
meaning independent of their syntax. We will treat syntax separately,
|
|||
|
later in this document.
|
|||
|
|
|||
|
We will want our data type to accommodate *atoms* (numbers and text),
|
|||
|
*products* (both tuples and sequences), and *labelled
|
|||
|
sums*.[^zephyr-asdl] It should also include *keyed maps*. We should
|
|||
|
avoid unnecessary restrictions such as machine-oriented fixed-width
|
|||
|
integer or floating-point values where possible.
|
|||
|
|
|||
|
[^zephyr-asdl]: This design was loosely inspired by Zephyr ASDL (h/t
|
|||
|
[Darius Bacon](https://twitter.com/abecedarius/status/993545767884226561)),
|
|||
|
which doesn't offer much in the way of atoms, but offers
|
|||
|
general-purpose labelled sums and products. See D. C. Wang, A. W.
|
|||
|
Appel, J. L. Korn, and C. S. Serra, “The Zephyr Abstract Syntax
|
|||
|
Description Language,” in USENIX Conference on Domain-Specific
|
|||
|
Languages, 1997, pp. 213–228.
|
|||
|
[PDF available.](https://www.usenix.org/legacy/publications/library/proceedings/dsl97/full_papers/wang/wang.pdf)
|
|||
|
|
|||
|
### Values
|
|||
|
|
|||
|
A `Value` is one of:
|
|||
|
|
|||
|
- a `ByteString` for general-purpose non-numeric atomic data,[^byte-string-rationale]
|
|||
|
- a `Number` for integers and rational numbers,
|
|||
|
- a `List` for general-purpose variable-length sequences,
|
|||
|
- a `Map` for general-purpose variable-size key/value maps, or
|
|||
|
- a `Record` for tagging a tuple of values with an intended interpretation.
|
|||
|
|
|||
|
We define a total order over `Value`s: Every `ByteString` is less than
|
|||
|
the other kinds of `Value`; every `Number` is less than any `List`,
|
|||
|
`Map` or `Record`, but greater than any `ByteString`; and so on.
|
|||
|
|
|||
|
That is, `ByteString < Number < List < Map < Record`.
|
|||
|
|
|||
|
Two values of the same kind are compared using kind-specific rules,
|
|||
|
given below.
|
|||
|
|
|||
|
Two `Value`s are equal if neither is less than the other according to
|
|||
|
the total order.
|
|||
|
|
|||
|
[^byte-string-rationale]: Why include `ByteString`, when we could
|
|||
|
instead use a reserved `Record` along with a `List` of `Number`s?
|
|||
|
((TODO: Actually decide about this! Similarly, why include `Map`
|
|||
|
rather than a restricted form of `List` with a `Record`? I think
|
|||
|
the answer has to do with the arbitrariness of the label we'd
|
|||
|
pick: unless *extremely* carefully chosen (i.e. number 0 (ideally
|
|||
|
even `-Inf`!) for byte strings, number 1 for map, and have the
|
|||
|
order go `Number < Record < List`), they would mess up the
|
|||
|
prettiness of the ordering. Though... we could ultimately reduce
|
|||
|
this to `Number` and `Record`, and have a family of `#"list"` and
|
|||
|
`#"map"` `Record`s...))
|
|||
|
|
|||
|
### Byte strings
|
|||
|
|
|||
|
A `ByteString` is an ordered sequence of zero or more integers in the
|
|||
|
inclusive range [0..255].
|
|||
|
|
|||
|
`ByteString`s are compared lexicographically.
|
|||
|
|
|||
|
We will write examples of `ByteString`s that contain only ASCII
|
|||
|
characters using “`#"`” as an opening quote mark and “`"`” as a
|
|||
|
closing quote mark.
|
|||
|
|
|||
|
**Examples.** The `ByteString` containing the three ASCII characters
|
|||
|
`A`, `B` and `C` is written as `#"ABC"`. The empty `ByteString` is
|
|||
|
written as `#""`. **N.B.** Despite appearances, these are *binary*
|
|||
|
data.
|
|||
|
|
|||
|
### Numbers
|
|||
|
|
|||
|
A `Number` is a signed rational number of finite precision whose
|
|||
|
magnitude can be exactly represented in base two with a finite number
|
|||
|
of digits. This includes integers of arbitrary width as well as (for
|
|||
|
example) the non-infinite non-NaN IEEE 754 floating-point values.
|
|||
|
|
|||
|
`Number`s are compared as mathematical numbers.
|
|||
|
|
|||
|
We will write examples of `Number`s using standard mathematical
|
|||
|
notation.
|
|||
|
|
|||
|
**Examples.** 10, -6, 0.5, -3/2, 33/192, -1.202E4567.
|
|||
|
|
|||
|
**Non-examples.** NaN (the clue is in the name!), ∞ (not finite),
|
|||
|
0.2 (cannot be exactly represented with a finite number of binary
|
|||
|
digits), 1/7 (likewise), 2+*i*3 (not rational), √2 (likewise).
|
|||
|
|
|||
|
### Lists
|
|||
|
|
|||
|
A `List` is an ordered sequence of zero or more `Value`s.
|
|||
|
|
|||
|
`List`s are compared lexicographically, appealing to the ordering on
|
|||
|
`Value`s for comparisons at each position in the `List`s.
|
|||
|
|
|||
|
### Maps
|
|||
|
|
|||
|
A `Map` is an *unordered* collection of zero or more pairs of
|
|||
|
`Value`s. Each pair comprises a *key* and a *value*. Keys in a `Map`
|
|||
|
must be pairwise distinct.
|
|||
|
|
|||
|
Instances of `Map` are compared by lexicographic comparison of the
|
|||
|
sequences resulting from ordering each `Map`'s pairs in ascending
|
|||
|
order by key. ((TODO: Is this a good idea? Is it clearly-enough
|
|||
|
written? An alternative approach is to compare first by the *count* of
|
|||
|
pairs, and only if the count is the same, start comparing the pairs
|
|||
|
themselves.))
|
|||
|
|
|||
|
### Records
|
|||
|
|
|||
|
A `Record` is a tuple of one or more `Value`s. The first in the tuple
|
|||
|
is called the *label* of the `Record`, and the other elements of the
|
|||
|
tuple are called its *fields*.
|
|||
|
|
|||
|
`Record` labels are *usually* `ByteString`s, but can be any kind of
|
|||
|
`Value`.[^iri-labels]
|
|||
|
|
|||
|
[^iri-labels]: It is occasionally (but seldom) necessary to
|
|||
|
interpret such `ByteString` labels as UTF-8 encoded IRIs. Where a
|
|||
|
label can be read as a relative IRI, it is notionally interpreted
|
|||
|
with respect to the IRI `http://spki-cat.org/` ((TODO:
|
|||
|
placeholder)); where a label can be read as an absolute IRI, it
|
|||
|
stands for that IRI; and otherwise, it cannot be read as an IRI at
|
|||
|
all, and so the label simply stands for itself - for its own
|
|||
|
`Value`.
|
|||
|
|
|||
|
`Record`s are compared lexicographically as if they were just tuples;
|
|||
|
that is, first by their labels, and then by the remainder of their
|
|||
|
fields.
|
|||
|
|
|||
|
We will write examples of `Record`s with `ByteString` labels entirely
|
|||
|
composed of ASCII characters as their label followed by their
|
|||
|
parenthesised, comma-separated fields.
|
|||
|
|
|||
|
**Examples.** The `Record` with label `#"foo"` and fields 1, 2 and 3
|
|||
|
is written `#"foo"(1, 2, 3)`; the `Record` with label `#"void"` and no
|
|||
|
fields is written `#"void"()`.
|
|||
|
|
|||
|
## Conventions for Common Data Types
|
|||
|
|
|||
|
The `Value` data type is essentially an abstract S-Expression, able to
|
|||
|
represent semi-structured data over `ByteString` and `Number` atoms.
|
|||
|
|
|||
|
However, users need a wide variety of data types for representing
|
|||
|
domain-specific values such as text, calendrical values, machine
|
|||
|
words, IEEE 754 floating-point values, booleans, and so on.
|
|||
|
|
|||
|
We use appropriately-labelled `Record`s to denote these
|
|||
|
domain-specific data types.
|
|||
|
|
|||
|
All of these conventions are optional. They form a layer atop the core
|
|||
|
`Value` structure. Non-domain-specific tools do not in general need to
|
|||
|
treat them specially.
|
|||
|
|
|||
|
**Validity.** Many of the labels we will describe in this section come
|
|||
|
with side-conditions on the contents of labelled `Record`s. It is
|
|||
|
possible to construct an instance of `Value` that violates these
|
|||
|
side-conditions without ceasing to be a `Value` or becoming
|
|||
|
unrepresentable. However, we say that such a `Value` is *invalid*
|
|||
|
because it fails to honour the necessary side-conditions.
|
|||
|
Implementations *SHOULD* allow two modes of working: one which
|
|||
|
treats all `Value`s identically, without regard for side-conditions,
|
|||
|
and one which enforces validity (i.e. side-conditions) when reading,
|
|||
|
writing, or constructing `Value`s.
|
|||
|
|
|||
|
### Text
|
|||
|
|
|||
|
A `Text` is a `Record` labelled with the `ByteString` `#"utf-8"` and
|
|||
|
having a single field that is also a `ByteString`. The field *MUST* be
|
|||
|
valid UTF-8.
|
|||
|
|
|||
|
We will write examples of `Text`s that contain Unicode text using
|
|||
|
“`"`” as both an opening and closing quote mark.
|
|||
|
|
|||
|
**Examples.** The `Text` containing the three Unicode code points `z`
|
|||
|
(0x7A), `水` (0x6C34) and `𝄞` (0x1D11E) is written as `"z水𝄞"`.
|
|||
|
|
|||
|
**Normalization forms.** Unicode defines multiple
|
|||
|
[normalization forms](http://unicode.org/reports/tr15/) for text. The
|
|||
|
ordering and equivalence relations defined for `Value`s mean that, for
|
|||
|
Unicode text, the UTF-8 encoded byte-level form of a text is used in
|
|||
|
comparisons.[^utf8-is-awesome] In order for users to unambiguously
|
|||
|
signal or require a particular normalization form, we define a
|
|||
|
`NormalizedText`, which is a `Record` labelled with
|
|||
|
`#"unicode-normalization"` and having two fields, the first of which
|
|||
|
is a `Text` specifying the normalization form used (e.g. `"nfc"`,
|
|||
|
`"nfd"`, `"nfkc"`, `"nfkd"`), and the second of which is a `Text`
|
|||
|
whose underlying representation *MUST* be normalized according to the
|
|||
|
named normalization form.
|
|||
|
|
|||
|
[^utf8-is-awesome]: Happily, the design of UTF-8 is such that this
|
|||
|
gives the same result as a lexicographic code-point-by-code-point
|
|||
|
comparison!
|
|||
|
|
|||
|
**IRIs.** (URIs, URLs, URNs, etc.) An `IRI` is a `Record` labelled
|
|||
|
with `#"iri"` and having one field, a `Text` which is the IRI itself
|
|||
|
and which *MUST* be a valid absolute or relative IRI.
|
|||
|
|
|||
|
**Symbols.** Programming languages like Lisp and Prolog frequently use
|
|||
|
string-like values called *symbols*. A `Symbol` is a `Record` labelled
|
|||
|
with `#"symbol"` and having one field, a `Text`.
|
|||
|
|
|||
|
### Numbers
|
|||
|
|
|||
|
The definition of `Number` captures all integers and all
|
|||
|
finitely-representable floating-point values. However, in certain
|
|||
|
circumstances it can be valuable to assert that a number inhabits a
|
|||
|
particular range, such as a fixed-width machine word or an IEEE 754
|
|||
|
floating-point value.
|
|||
|
|
|||
|
**Fixed-width machine words.** (16-, 32- and 64-bit) A family of
|
|||
|
labels `i`*n* and `u`*n* for *n* ∈ {16,32,64} denote *n*-bit-wide
|
|||
|
signed and unsigned range restrictions, respectively. Records with
|
|||
|
these labels *MUST* have one field, a `Number`, which *MUST* fall
|
|||
|
within the appropriate range. That is, to be valid,
|
|||
|
- in `#"i16"(`*x*`)`, -32768 <= *x* <= 32767, and ⌊*x*⌋ = *x*.
|
|||
|
- in `#"u16"(`*x*`)`, 0 <= *x* <= 65535, and ⌊*x*⌋ = *x*.
|
|||
|
- in `#"i32"(`*x*`)`, -2147483648 <= *x* <= 2147483647, and ⌊*x*⌋ = *x*.
|
|||
|
- etc.
|
|||
|
|
|||
|
**IEEE 754 floating-point.** (single- and double-precision) The labels
|
|||
|
`f32` and `f64` denote single- and double-precision IEEE 754
|
|||
|
floating-point values, respectively. Records with these labels *MUST*
|
|||
|
have one field. This field *MUST* either be a `Number`, which *MUST*
|
|||
|
fall within the appropriate representable range, or one of the records
|
|||
|
`#"nan"()`, `#"+inf"()` or `#"-inf"()`.
|
|||
|
|
|||
|
### Anonymous Tuples and Unit
|
|||
|
|
|||
|
A `Tuple` is a `Record` with label `#"tuple"` and zero or more fields,
|
|||
|
denoting an anonymous tuple of values.
|
|||
|
|
|||
|
The 0-ary tuple, `#"tuple"()`, denotes the empty tuple, sometimes
|
|||
|
called "unit" or "void" (but *not* e.g. JavaScript's "undefined"
|
|||
|
value).
|
|||
|
|
|||
|
### Booleans, Null and Undefined
|
|||
|
|
|||
|
The two 0-ary `Record`s `#"true"()` and `#"false"()` denote the "true"
|
|||
|
and "false" Boolean values, respectively.
|
|||
|
|
|||
|
Tony Hoare's
|
|||
|
"[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)"
|
|||
|
can be represented with the 0-ary `Record` `#"null"()`. An "undefined"
|
|||
|
value can be represented as `#"undefined"()`.
|
|||
|
|
|||
|
### Dates and Times
|
|||
|
|
|||
|
Dates, times, moments, and timestamps can be represented with a
|
|||
|
`Record` with label `#"rfc3339"` having a single field, a `Text`,
|
|||
|
which *MUST* conform to one of the `full-date`, `partial-time`,
|
|||
|
`full-time`, or `date-time` productions of
|
|||
|
[section 5.6 of RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6).
|
|||
|
|
|||
|
## Syntax
|
|||
|
|
|||
|
Now we have discussed `Value`s and their meanings, we may turn to
|
|||
|
techniques for *representing* `Value`s for communication or storage.
|
|||
|
|
|||
|
The syntax we have used for the examples so far is inadequate in many
|
|||
|
ways, not least of which is that it cannot represent every `Value`.
|
|||
|
|
|||
|
Separation of the meaning of a piece of syntax from the syntax itself
|
|||
|
opens the door to domain-specific syntaxes, all equivalent and
|
|||
|
interconvertible.[^asn1] With a robust semantic foundation,
|
|||
|
connections to other data languages can also be made.
|
|||
|
|
|||
|
[^asn1]: Those who remember
|
|||
|
[ASN.1](https://www.itu.int/en/ITU-T/asn1/Pages/introduction.aspx)
|
|||
|
will recall BER, DER, PER, CER, XER and so on, each appropriate to
|
|||
|
a different setting. Similarly,
|
|||
|
[Rivest's S-Expression design][sexp.txt] offers a human-friendly
|
|||
|
syntax, a syntax robust to network-induced message corruption, and
|
|||
|
an unambiguous, simple and easily-parsed machine-friendly syntax
|
|||
|
for the same underlying values.
|
|||
|
|
|||
|
### Transfer syntax: S-Expressions
|
|||
|
|
|||
|
For now, we limit our attention to an easily-parsed, easily-produced
|
|||
|
machine-readable syntax by mapping our `Value`s to the canonical form
|
|||
|
of [Rivest's S-Expressions][sexp.txt].[^why-not-spki-sexps]
|
|||
|
|
|||
|
[^why-not-spki-sexps]: Why not just use Rivest's S-Expressions as
|
|||
|
they are? While they include binary data and sequences, and an
|
|||
|
obvious equivalence for them exists, they lack numbers *per se* as
|
|||
|
well as any kind of unordered structure such as sets or maps. In
|
|||
|
addition, while "display hints" allow labelling of binary data
|
|||
|
with an intended interpretation, they cannot be attached to any
|
|||
|
other kind of structure, and the "hint" itself can only be a
|
|||
|
binary blob.
|
|||
|
|
|||
|
#### Byte strings
|
|||
|
|
|||
|
`ByteString`s map to byte-string S-Expressions.
|
|||
|
|
|||
|
**Examples.**
|
|||
|
- What we have been writing above as `#"ABC"` would be represented as
|
|||
|
the S-Expression `3:ABC`.
|
|||
|
- The empty `ByteString` is represented by the S-Expression `0:`.
|
|||
|
|
|||
|
#### Numbers
|
|||
|
|
|||
|
Numbers are the most complicated values to represent as an
|
|||
|
S-Expression.
|
|||
|
|
|||
|
((TODO: Consider cutting complexity by e.g. representing a `Number` as
|
|||
|
a sign bit, a little-endian blob of the integer part of the number,
|
|||
|
and a little-endian blob of the fractional part of the number. Lots of
|
|||
|
trailing/leading zeros for very large/small numbers!))
|
|||
|
|
|||
|
We represent `Number`s using a sign-magnitude format, where the
|
|||
|
magnitude is written using a little-endian, twos-complement binary
|
|||
|
[*significand*](https://en.wikipedia.org/wiki/Significand) and a
|
|||
|
(signed) *shift amount*.
|
|||
|
|
|||
|
In essence, we use a generalized, variable-width form of binary IEEE
|
|||
|
floating-point representation.
|
|||
|
|
|||
|
Let `N` be the `Number` to represent as an S-Expression.
|
|||
|
|
|||
|
The sign bit is 0 when `N` is zero or positive, and 1 when `N` is
|
|||
|
negative.
|
|||
|
|
|||
|
The magnitude of `N` can be viewed as an infinite sequence of bits
|
|||
|
with a fraction-separator mark placed somewhere in the sequence,
|
|||
|
|
|||
|
```
|
|||
|
···00.000 b_0 b_1 ··· b_{k-1} b_k ··· b_{n-1} 000000···
|
|||
|
···000000 b_0 b_1 ··· b_{k-1} . b_k ··· b_{n-1} 000000···
|
|||
|
···000000 b_0 b_1 ··· b_{k-1} b_k ··· b_{n-1} 000.00···
|
|||
|
```
|
|||
|
|
|||
|
where `b_0` is the leftmost (most significant) and `b_{n-1}` the
|
|||
|
rightmost (least significant) non-zero bit.
|
|||
|
|
|||
|
Let `k`, the position of the fraction-separator mark, be `i` when it
|
|||
|
is immediately to the left of `b_i` for some `i`, generalizing to
|
|||
|
negative values when it is to the left of `b_0` and values greater
|
|||
|
than `n-1` when it is to the right of `b_{n-1}`.
|
|||
|
|
|||
|
For example, `k` will be:
|
|||
|
- 0 when the fraction-separator is immediately (i.e. zero bits) to the left of `b_0`;
|
|||
|
- -3 (as in the first example above) when it is three bits left of `b_0`;
|
|||
|
- `n` when it is immediately (i.e. zero bits) to the right of `b_{n-1}`;
|
|||
|
- `n`+3 when it is three bits to the right of `b_{n-1}`.
|
|||
|
|
|||
|
The unpadded significand is `b_0 b_1 ··· b_{n-1}`.
|
|||
|
|
|||
|
When `k` < `n`, the shift `z`=`k-n` and the significand is:
|
|||
|
- the unpadded significand,
|
|||
|
- with the sign bit appended to it on the right, and then
|
|||
|
- padded on the left with zeroes until it is a whole number of octets wide.
|
|||
|
|
|||
|
When `k` ≥ `n`, the shift `z`=`8×⌊(k-n)/8⌋` and the significand is:
|
|||
|
- the unpadded significand,
|
|||
|
- padded on the right with `(k-n) mod 8` zeroes,
|
|||
|
- with the sign bit then appended on the right, and then
|
|||
|
- padded on the left with zeroes until it is a whole number of octets wide.
|
|||
|
|
|||
|
Now, let `s`=2`z` if `z` is zero or positive, or `s`=2|`z`|+1 if `z`
|
|||
|
is negative.
|
|||
|
|
|||
|
Finally, the S-Expression form of `N` is:
|
|||
|
- `(4:*num [SIGNIFICAND] [SHIFT])`, if `s`≠0; or
|
|||
|
- `(4:*num [SIGNIFICAND])`, if `s`=0 but the significand contains non-zero bits; or
|
|||
|
- `(4:*num)`, if `s`=0 and the significand contains no non-zero bits;
|
|||
|
|
|||
|
where
|
|||
|
- `[SIGNIFICAND]` stands for a byte-string S-Expression containing a little-endian representation of the significand, and
|
|||
|
- `[SHIFT]` stands for a byte-string S-Expression containing a little-endian representation of `s`.
|
|||
|
|
|||
|
**Examples.** (Shown using the hexadecimal representation of
|
|||
|
byte-strings from
|
|||
|
[section 4.4 of Rivest's S-Expression specification][sexp.txt] in
|
|||
|
places.)
|
|||
|
- `N`=0 → `(4:*num)`
|
|||
|
- `N`=1 → `(4:*num#02#)`
|
|||
|
- `N`=-1 → `(4:*num#03#)`
|
|||
|
- `N`=10₁₀=1010.0₂ → `n`=3, `k`=4, `z`=0, `s`=0 → `(4:*num#14#)`
|
|||
|
- `N`=2560₁₀=101000000000.0₂ → `n`=3, `k`=12, `z`=8, `s`=16 → `(4:*num#14##10#)`
|
|||
|
- `N`=-2560₁₀=-101000000000.0₂ → `n`=3, `k`=12, `z`=8, `s`=16 → `(4:*num#15##10#)`
|
|||
|
- `N`=-6₁₀=-110.0₂ → `n`=2, `k`=3, `z`=0, `s`=0 → `(4:*num#0D#)`
|
|||
|
- `N`=0.5₁₀=0.1₂ → `n`=1, `k`=0, `z`=-1, `s`=3 → `(4:*num#02##03#)`
|
|||
|
- `N`=-3/2₁₀=-1.1₂ → `n`=2, `k`=1, `z`=-1, `s`=3 → `(4:*num#07##03#)`
|
|||
|
- `N`=33/192₁₀=0.001011₂ → `n`=4, `k`=-2, `z`=-6, `s`=7 → `(4:*num#16##07#)`
|
|||
|
- `N`=-1.202E4567=1011011001···000₂ (15172 binary digits, the last 4565 of which are zero) → `n`=10607, `k`=15172, `z`=4560, `s`=9120 → `(4:*num#41828E···24CD16##A023#)`
|
|||
|
|
|||
|
((TODO: figure out what this algorithm would actually look like in,
|
|||
|
say, C, Python and Racket.))
|
|||
|
|
|||
|
#### Lists
|
|||
|
|
|||
|
A `List` maps to an S-Expression list of representations of its
|
|||
|
elements, with the byte-string S-Expression `5:*list` prepended.
|
|||
|
|
|||
|
**Examples.**
|
|||
|
- The `List` containing the `ByteString`s `#"a"`, `#"b"`, and `#"c"`
|
|||
|
would be represented as the S-Expression `(5:*list1:a1:b1:c)`.
|
|||
|
- The empty `List` is represented by the S-Expression `(5:*list)`.
|
|||
|
|
|||
|
#### Maps
|
|||
|
|
|||
|
A `Map` is represented by an S-Expression list of representations of
|
|||
|
the `Map`'s key-value pairs, with the byte-string `4:*map` prepended.
|
|||
|
|
|||
|
Each key-value pair is represented by a two-element S-Expression list
|
|||
|
containing representations of the key and the value, in that order.
|
|||
|
|
|||
|
The key-value pairs *MUST* be ordered by `Value`-order of their keys.
|
|||
|
|
|||
|
**Examples.**
|
|||
|
- The `Map` containing entries mapping `#"a"` to `#"d"` and `#"c"` to
|
|||
|
`#"b"` is represented by `(4:*map(1:a1:d)(1:c1:b))`.
|
|||
|
- The `Map` containing an entry mapping the empty list to a "true"
|
|||
|
Boolean value is represented by `(4:*map((5:*list)(4:true)))`.
|
|||
|
- The empty `Map` is represented by `(4:*map)`.
|
|||
|
|
|||
|
**Non-examples.**
|
|||
|
- The S-Expression `(4:*map(1:c1:b)(1:a1:d))` is invalid, because its
|
|||
|
key-value pairs are not in `Value`-order by key: `#"c"` > `#"a"`.
|
|||
|
- The S-Expression `(4:*map1:a1:d1:c1:b)` is invalid, because its
|
|||
|
key-value pairs appear "flattened" in the outer list, rather than
|
|||
|
each appearing in a two-element list of its own.
|
|||
|
|
|||
|
#### Records
|
|||
|
|
|||
|
A `Record` is represented by an S-Expression list of its fields,
|
|||
|
prepended by:
|
|||
|
|
|||
|
- the representation of its label, if its label is a `ByteString` and
|
|||
|
does not begin with byte 42 (ASCII "`*`"); or
|
|||
|
- the S-Expression `1:*` followed by the representation of the
|
|||
|
`Record`'s label, otherwise.
|
|||
|
|
|||
|
**Examples.**
|
|||
|
- The `Text` `"hello-world"` is represented by the S-Expression
|
|||
|
`(5:utf-811:hello-world)`.
|
|||
|
- The `IRI` denoting `http://www.w3.org/` is represented by the
|
|||
|
S-Expression `(3:iri(5:utf-818:http://www.w3.org/))`.
|
|||
|
- The `Record` `#"*"()` is represented by the S-Expression
|
|||
|
`(1:*1:*)`.
|
|||
|
- The `Record` `#"*foo"(#"*bar")` is represented by the S-Expression
|
|||
|
`(1:*4:*foo4:*bar)`.
|
|||
|
- The `Record` with the empty list as its label and no fields is
|
|||
|
represented by the S-Expression `(1:*(5:*list))`.
|
|||
|
- `(7:rfc3339(5:utf-83:foo))` represents a well-formed `Value` that
|
|||
|
is a `Record` with `#"rfc3339"` as its label, and a single `Text`
|
|||
|
field. While it is a perfectly reasonable `Value`, it does *not*
|
|||
|
represent a valid date or time, since the `Text` `"foo"` does not
|
|||
|
conform to any of the RFC 3339 productions enumerated above.
|
|||
|
|
|||
|
**Non-examples.**
|
|||
|
- `((5:*list))` is not a representation of the `Record` with the
|
|||
|
empty list as its label and no fields, because that `Record` has a
|
|||
|
non-`ByteString` as its label, mandating a `1:*` prefix on its
|
|||
|
S-Expression representation.
|
|||
|
- `(4:*foo4:*bar)` does not represent the `Record`
|
|||
|
`#"*foo"(#"*bar")`, because the label `#"*foo"` begins with "`*`",
|
|||
|
mandating a `1:*` prefix on the `Record`'s S-Expression
|
|||
|
representation.
|
|||
|
|
|||
|
## Examples
|
|||
|
|
|||
|
((TODO: Give some examples of large and small SPKI-CAT documents,
|
|||
|
perhaps translated from various JSON blobs floating around the
|
|||
|
internet.))
|
|||
|
|
|||
|
## Representing Values in Programming Languages
|
|||
|
|
|||
|
We have given a definition of `Value` and its semantics, and proposed
|
|||
|
a concrete syntax for communicating and storing `Value`s. We now turn
|
|||
|
to **suggested** representations of `Value`s as *programming-language
|
|||
|
values* for various programming languages.
|
|||
|
|
|||
|
### JavaScript
|
|||
|
|
|||
|
- `ByteString` ↔ `Uint8Array`
|
|||
|
- `Number` ↔ numbers, problematically; bignums, perhaps; other?? TODO
|
|||
|
- `List` ↔ `Array`
|
|||
|
- `Map` ↔ `Object`
|
|||
|
- `Record` ↔ an instance of something like `Record` below, unless the label is...
|
|||
|
- `#"utf-8"` ↔ `String`
|
|||
|
- `#"true"` ↔ `true`
|
|||
|
- `#"false"` ↔ `false`
|
|||
|
- `#"null"` ↔ `null`
|
|||
|
- `#"undefined"` ↔ the undefined value
|
|||
|
- `#"rfc3339"` ↔ `Date`, if the `Record`'s field matches the `date-time` RFC 3339 production
|
|||
|
|
|||
|
```javascript
|
|||
|
function Record(label, ...fields) {
|
|||
|
this.label = label;
|
|||
|
this.fields = fields;
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
### Scheme/Racket
|
|||
|
|
|||
|
- `ByteString` ↔ byte vector (Racket: "Bytes")
|
|||
|
- `Number` ↔ numbers
|
|||
|
- `List` ↔ (where possible, immutable) list
|
|||
|
- `Map` ↔ hash-table
|
|||
|
- `Record` ↔ a structure (Racket: a "prefab struct"), unless the label is...
|
|||
|
- `#"utf-8"` ↔ a string
|
|||
|
- `#"true"` ↔ `#t`
|
|||
|
- `#"false"` ↔ `#f`
|
|||
|
- `#"symbol"` ↔ a symbol
|
|||
|
|
|||
|
### Java
|
|||
|
|
|||
|
- `ByteString` ↔ `byte[]`
|
|||
|
- `Number` ↔ numbers, problematically; bignums, perhaps; other?? TODO
|
|||
|
- `List` ↔ `java.util.List`
|
|||
|
- `Map` ↔ `java.util.Map`
|
|||
|
- `Record` ↔ an instance of something like `Record` below, unless the label is...
|
|||
|
- `#"utf-8"` ↔ `java.lang.String`
|
|||
|
- `#"true"` ↔ `java.lang.Boolean.TRUE`
|
|||
|
- `#"false"` ↔ `java.lang.Boolean.FALSE`
|
|||
|
- `#"null"` ↔ a special singleton object, but *not* Java's `null`
|
|||
|
- `#"rfc3339"` ↔ `java.util.{Date,Time,Timestamp}`, according to which RFC 3339 production the `Record`'s field matches
|
|||
|
|
|||
|
### Erlang
|
|||
|
|
|||
|
- `ByteString` ↔ a binary
|
|||
|
- `Number` ↔ numbers, probably; TODO
|
|||
|
- `List` ↔ a list
|
|||
|
- `Map` ↔ a [map](http://erlang.org/doc/reference_manual/data_types.html#id77432) (new in Erlang/OTP R17)
|
|||
|
- `Record` ↔ a tuple with the label in the first position, and the fields in subsequent positions, unless the label is...
|
|||
|
- `#"true"` ↔ `true`
|
|||
|
- `#"false"` ↔ `false`
|
|||
|
- `#"null"` ↔ `null`
|
|||
|
- `#"undefined"` ↔ `undefined`
|
|||
|
- `#"symbol"` ↔ the `Text` field converted to an Erlang atom, if
|
|||
|
some kind of an "unsafe" mode is set on the decoder (because
|
|||
|
Erlang atoms are not GC'd); otherwise like any other kind of
|
|||
|
`Record`
|
|||
|
|
|||
|
---
|