663 lines
27 KiB
Markdown
663 lines
27 KiB
Markdown
---
|
||
---
|
||
<style>
|
||
body { font-size: 120%; margin-left: 2rem; }
|
||
h1, h2, h3, h4, h5, h6 { margin-left: -1rem; }
|
||
h2 { border-bottom: solid black 1px; }
|
||
</style>
|
||
|
||
2018-06-04 20:31:48 TODO: cwebber's email comments
|
||
2018-06-04 20:31:51 TODO: look at https://github.com/imbal/rson and at clojure EDN
|
||
2018-06-05 10:32:02 ... and at http://json-schema.org/latest/json-schema-core.html#rfc.section.4.2
|
||
|
||
# SPKI CAT: SPKI S-Expressions with Canonical Atom Tags
|
||
|
||
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
||
Christine Lemmer Webber <cwebber@dustycloud.org>
|
||
May 2018
|
||
Version 0.0.1
|
||
|
||
________________
|
||
/ \
|
||
/\__/\ / boo! \
|
||
/ \ \ i'm very spki! \
|
||
\=\_^__^_/= ___/\__________________/
|
||
|/ \
|
||
\\ | | /
|
||
<_|--|_>
|
||
|
||
## Introduction
|
||
|
||
This document proposes a language-neutral JSON-like *data type*, along
|
||
with a robust equivalence relation ("semantics") and a total ordering
|
||
over inhabitants of the type.[^tjson]
|
||
|
||
[^tjson]: [TJSON](https://www.tjson.org/) has a similar aim:
|
||
“different on-the-wire representations of an object correspond to
|
||
the same typed data object”
|
||
([source](https://news.ycombinator.com/item?id=12860143)); “TJSON
|
||
is defined as a serialization format on top of a JSON-like data
|
||
model” ([source](https://news.ycombinator.com/item?id=12860401)).
|
||
|
||
It then suggests conventions for encoding common data formats in terms
|
||
of the proposed data type.
|
||
|
||
Finally, it proposes concrete *syntax* for the data type, offering a
|
||
language-neutral transfer syntax (based on
|
||
[Rivest's S-Expressions][sexp.txt] as used in [SPKI/SDSI][spki]) and
|
||
suggesting possible language-specific representations for the data
|
||
type's inhabitants.
|
||
|
||
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
|
||
[spki]: http://world.std.com/~cme/html/spki.html
|
||
|
||
### Why not Just Use JSON?
|
||
|
||
<!-- JSON lacks semantics: JSON syntax doesn't denote anything -->
|
||
|
||
JSON offers *syntax* for numbers, strings, booleans, null, arrays and
|
||
string-keyed maps. However, it offers no *semantics* for the syntax:
|
||
it is left to each implementation to determine how to treat each JSON
|
||
term. This causes
|
||
[interoperability](http://seriot.ch/parsing_json.php) and even
|
||
[security](http://seriot.ch/parsing_json.php) issues.
|
||
|
||
Specifically, JSON does not:
|
||
|
||
- assign any meaning to numbers,[^meaning-ieee-double]
|
||
- determine how strings are to be compared,[^string-key-comparison]
|
||
- determine whether object key ordering is significant, or
|
||
- determine whether duplicate object keys are permitted, what it
|
||
would mean if they were, or how to determine a duplicate in the
|
||
first place.
|
||
|
||
In short, JSON syntax doesn't *denote* anything.[^xml-infoset] [^other-formats]
|
||
|
||
[^meaning-ieee-double]:
|
||
[Section 6 of RFC 7159](https://tools.ietf.org/html/rfc7159#section-6)
|
||
does go so far as to indicate “good interoperability can be
|
||
achieved” by imagining that parsers are able reliably to
|
||
understand the syntax of numbers as denoting an IEEE 754
|
||
double-precision floating-point value.
|
||
|
||
[^string-key-comparison]:
|
||
[Section 8.3 of RFC 7159](https://tools.ietf.org/html/rfc7159#section-8.3)
|
||
suggests that *if* an implementation compares strings used as
|
||
object keys “code unit by code unit”, then it will interoperate
|
||
with *other such implementations*, but neither requires this
|
||
behaviour nor discusses comparisons of strings used in other
|
||
contexts.
|
||
|
||
[^xml-infoset]: The XML world has the concept of
|
||
[XML infoset](https://www.w3.org/TR/xml-infoset/). Loosely
|
||
speaking, XML infoset is the *denotation* of an XML document; the
|
||
*meaning* of the document.
|
||
|
||
[^other-formats]: Most other recent data languages are like JSON in
|
||
specifying only a syntax with no associated semantics. While some
|
||
do make a sketch of a semantics, the result is often
|
||
underspecified (e.g. in terms of how strings are to be compared),
|
||
overly machine-oriented (e.g. treating 32-bit integers as
|
||
fundamentally distinct from 64-bit integers and from
|
||
floating-point numbers), overly fine (e.g. giving visibility to
|
||
the order in which map entries are written), or all three.
|
||
|
||
Some examples:
|
||
|
||
- are the JSON values `1`, `1.0`, and `1e0` the same or different?
|
||
- are the JSON values `1.0` and `1.0000000000000001` the same or different?
|
||
- are the JSON strings `"päron"` (UTF-8 `70c3a4726f6e`) and `"päron"`
|
||
(UTF-8 `7061cc88726f6e`) the same or different?
|
||
- are the JSON objects `{"a":1, "b":2}` and `{"b":2, "a":1}` the same
|
||
or different?
|
||
- which, if any, of `{"a":1, "a":2}`, `{"a":1}` and `{"a":2}` are the
|
||
same? Are all three legal?
|
||
- are `{"päron":1}` and `{"päron":1}` the same or different?
|
||
|
||
Different JSON implementations give different answers to these
|
||
questions. The JSON specifications are silent on these questions.
|
||
|
||
There are other minor problems with JSON having to do with its syntax.
|
||
Examples include its relative verbosity and its lack of support for
|
||
binary data.
|
||
|
||
## Starting with Semantics
|
||
|
||
Taking inspiration from functional programming, we start with a
|
||
definition of the *values* that we want to work with and give them
|
||
meaning independent of their syntax. We will treat syntax separately,
|
||
later in this document.
|
||
|
||
We will want our data type to accommodate *atoms* (numbers and text),
|
||
*products* (both tuples and sequences), and *labelled
|
||
sums*.[^zephyr-asdl] It should also include *keyed maps*. We should
|
||
avoid unnecessary restrictions such as machine-oriented fixed-width
|
||
integer or floating-point values where possible.
|
||
|
||
[^zephyr-asdl]: This design was loosely inspired by Zephyr ASDL (h/t
|
||
[Darius Bacon](https://twitter.com/abecedarius/status/993545767884226561)),
|
||
which doesn't offer much in the way of atoms, but offers
|
||
general-purpose labelled sums and products. See D. C. Wang, A. W.
|
||
Appel, J. L. Korn, and C. S. Serra, “The Zephyr Abstract Syntax
|
||
Description Language,” in USENIX Conference on Domain-Specific
|
||
Languages, 1997, pp. 213–228.
|
||
[PDF available.](https://www.usenix.org/legacy/publications/library/proceedings/dsl97/full_papers/wang/wang.pdf)
|
||
|
||
### Values
|
||
|
||
A `Value` is one of:
|
||
|
||
- a `ByteString` for general-purpose non-numeric atomic data,[^byte-string-rationale]
|
||
- a `Number` for integers and rational numbers,
|
||
- a `List` for general-purpose variable-length sequences,
|
||
- a `Map` for general-purpose variable-size key/value maps, or
|
||
- a `Record` for tagging a tuple of values with an intended interpretation.
|
||
|
||
We define a total order over `Value`s: Every `ByteString` is less than
|
||
the other kinds of `Value`; every `Number` is less than any `List`,
|
||
`Map` or `Record`, but greater than any `ByteString`; and so on.
|
||
|
||
That is, `ByteString < Number < List < Map < Record`.
|
||
|
||
Two values of the same kind are compared using kind-specific rules,
|
||
given below.
|
||
|
||
Two `Value`s are equal if neither is less than the other according to
|
||
the total order.
|
||
|
||
[^byte-string-rationale]: Why include `ByteString`, when we could
|
||
instead use a reserved `Record` along with a `List` of `Number`s?
|
||
((TODO: Actually decide about this! Similarly, why include `Map`
|
||
rather than a restricted form of `List` with a `Record`? I think
|
||
the answer has to do with the arbitrariness of the label we'd
|
||
pick: unless *extremely* carefully chosen (i.e. number 0 (ideally
|
||
even `-Inf`!) for byte strings, number 1 for map, and have the
|
||
order go `Number < Record < List`), they would mess up the
|
||
prettiness of the ordering. Though... we could ultimately reduce
|
||
this to `Number` and `Record`, and have a family of `#"list"` and
|
||
`#"map"` `Record`s...))
|
||
|
||
### Byte strings
|
||
|
||
A `ByteString` is an ordered sequence of zero or more integers in the
|
||
inclusive range [0..255].
|
||
|
||
`ByteString`s are compared lexicographically.
|
||
|
||
We will write examples of `ByteString`s that contain only ASCII
|
||
characters using “`#"`” as an opening quote mark and “`"`” as a
|
||
closing quote mark.
|
||
|
||
**Examples.** The `ByteString` containing the three ASCII characters
|
||
`A`, `B` and `C` is written as `#"ABC"`. The empty `ByteString` is
|
||
written as `#""`. **N.B.** Despite appearances, these are *binary*
|
||
data.
|
||
|
||
### Numbers
|
||
|
||
A `Number` is a signed rational number of finite precision whose
|
||
magnitude can be exactly represented in base two with a finite number
|
||
of digits. This includes integers of arbitrary width as well as (for
|
||
example) the non-infinite non-NaN IEEE 754 floating-point values.
|
||
|
||
`Number`s are compared as mathematical numbers.
|
||
|
||
We will write examples of `Number`s using standard mathematical
|
||
notation.
|
||
|
||
**Examples.** 10, -6, 0.5, -3/2, 33/192, -1.202E4567.
|
||
|
||
**Non-examples.** NaN (the clue is in the name!), ∞ (not finite),
|
||
0.2 (cannot be exactly represented with a finite number of binary
|
||
digits), 1/7 (likewise), 2+*i*3 (not rational), √2 (likewise).
|
||
|
||
### Lists
|
||
|
||
A `List` is an ordered sequence of zero or more `Value`s.
|
||
|
||
`List`s are compared lexicographically, appealing to the ordering on
|
||
`Value`s for comparisons at each position in the `List`s.
|
||
|
||
### Maps
|
||
|
||
A `Map` is an *unordered* collection of zero or more pairs of
|
||
`Value`s. Each pair comprises a *key* and a *value*. Keys in a `Map`
|
||
must be pairwise distinct.
|
||
|
||
Instances of `Map` are compared by lexicographic comparison of the
|
||
sequences resulting from ordering each `Map`'s pairs in ascending
|
||
order by key. ((TODO: Is this a good idea? Is it clearly-enough
|
||
written? An alternative approach is to compare first by the *count* of
|
||
pairs, and only if the count is the same, start comparing the pairs
|
||
themselves.))
|
||
|
||
### Records
|
||
|
||
A `Record` is a tuple of one or more `Value`s. The first in the tuple
|
||
is called the *label* of the `Record`, and the other elements of the
|
||
tuple are called its *fields*.
|
||
|
||
`Record` labels are *usually* `ByteString`s, but can be any kind of
|
||
`Value`.[^iri-labels]
|
||
|
||
[^iri-labels]: It is occasionally (but seldom) necessary to
|
||
interpret such `ByteString` labels as UTF-8 encoded IRIs. Where a
|
||
label can be read as a relative IRI, it is notionally interpreted
|
||
with respect to the IRI `http://spki-cat.org/` ((TODO:
|
||
placeholder)); where a label can be read as an absolute IRI, it
|
||
stands for that IRI; and otherwise, it cannot be read as an IRI at
|
||
all, and so the label simply stands for itself - for its own
|
||
`Value`.
|
||
|
||
`Record`s are compared lexicographically as if they were just tuples;
|
||
that is, first by their labels, and then by the remainder of their
|
||
fields.
|
||
|
||
We will write examples of `Record`s with `ByteString` labels entirely
|
||
composed of ASCII characters as their label followed by their
|
||
parenthesised, comma-separated fields.
|
||
|
||
**Examples.** The `Record` with label `#"foo"` and fields 1, 2 and 3
|
||
is written `#"foo"(1, 2, 3)`; the `Record` with label `#"void"` and no
|
||
fields is written `#"void"()`.
|
||
|
||
## Conventions for Common Data Types
|
||
|
||
The `Value` data type is essentially an abstract S-Expression, able to
|
||
represent semi-structured data over `ByteString` and `Number` atoms.
|
||
|
||
However, users need a wide variety of data types for representing
|
||
domain-specific values such as text, calendrical values, machine
|
||
words, IEEE 754 floating-point values, booleans, and so on.
|
||
|
||
We use appropriately-labelled `Record`s to denote these
|
||
domain-specific data types.
|
||
|
||
All of these conventions are optional. They form a layer atop the core
|
||
`Value` structure. Non-domain-specific tools do not in general need to
|
||
treat them specially.
|
||
|
||
**Validity.** Many of the labels we will describe in this section come
|
||
with side-conditions on the contents of labelled `Record`s. It is
|
||
possible to construct an instance of `Value` that violates these
|
||
side-conditions without ceasing to be a `Value` or becoming
|
||
unrepresentable. However, we say that such a `Value` is *invalid*
|
||
because it fails to honour the necessary side-conditions.
|
||
Implementations *SHOULD* allow two modes of working: one which
|
||
treats all `Value`s identically, without regard for side-conditions,
|
||
and one which enforces validity (i.e. side-conditions) when reading,
|
||
writing, or constructing `Value`s.
|
||
|
||
### Text
|
||
|
||
A `Text` is a `Record` labelled with the `ByteString` `#"utf-8"` and
|
||
having a single field that is also a `ByteString`. The field *MUST* be
|
||
valid UTF-8.
|
||
|
||
We will write examples of `Text`s that contain Unicode text using
|
||
“`"`” as both an opening and closing quote mark.
|
||
|
||
**Examples.** The `Text` containing the three Unicode code points `z`
|
||
(0x7A), `水` (0x6C34) and `𝄞` (0x1D11E) is written as `"z水𝄞"`.
|
||
|
||
**Normalization forms.** Unicode defines multiple
|
||
[normalization forms](http://unicode.org/reports/tr15/) for text. The
|
||
ordering and equivalence relations defined for `Value`s mean that, for
|
||
Unicode text, the UTF-8 encoded byte-level form of a text is used in
|
||
comparisons.[^utf8-is-awesome] In order for users to unambiguously
|
||
signal or require a particular normalization form, we define a
|
||
`NormalizedText`, which is a `Record` labelled with
|
||
`#"unicode-normalization"` and having two fields, the first of which
|
||
is a `Text` specifying the normalization form used (e.g. `"nfc"`,
|
||
`"nfd"`, `"nfkc"`, `"nfkd"`), and the second of which is a `Text`
|
||
whose underlying representation *MUST* be normalized according to the
|
||
named normalization form.
|
||
|
||
[^utf8-is-awesome]: Happily, the design of UTF-8 is such that this
|
||
gives the same result as a lexicographic code-point-by-code-point
|
||
comparison!
|
||
|
||
**IRIs.** (URIs, URLs, URNs, etc.) An `IRI` is a `Record` labelled
|
||
with `#"iri"` and having one field, a `Text` which is the IRI itself
|
||
and which *MUST* be a valid absolute or relative IRI.
|
||
|
||
**Symbols.** Programming languages like Lisp and Prolog frequently use
|
||
string-like values called *symbols*. A `Symbol` is a `Record` labelled
|
||
with `#"symbol"` and having one field, a `Text`.
|
||
|
||
### Numbers
|
||
|
||
The definition of `Number` captures all integers and all
|
||
finitely-representable floating-point values. However, in certain
|
||
circumstances it can be valuable to assert that a number inhabits a
|
||
particular range, such as a fixed-width machine word or an IEEE 754
|
||
floating-point value.
|
||
|
||
**Fixed-width machine words.** (16-, 32- and 64-bit) A family of
|
||
labels `i`*n* and `u`*n* for *n* ∈ {16,32,64} denote *n*-bit-wide
|
||
signed and unsigned range restrictions, respectively. Records with
|
||
these labels *MUST* have one field, a `Number`, which *MUST* fall
|
||
within the appropriate range. That is, to be valid,
|
||
- in `#"i16"(`*x*`)`, -32768 <= *x* <= 32767, and ⌊*x*⌋ = *x*.
|
||
- in `#"u16"(`*x*`)`, 0 <= *x* <= 65535, and ⌊*x*⌋ = *x*.
|
||
- in `#"i32"(`*x*`)`, -2147483648 <= *x* <= 2147483647, and ⌊*x*⌋ = *x*.
|
||
- etc.
|
||
|
||
**IEEE 754 floating-point.** (single- and double-precision) The labels
|
||
`f32` and `f64` denote single- and double-precision IEEE 754
|
||
floating-point values, respectively. Records with these labels *MUST*
|
||
have one field. This field *MUST* either be a `Number`, which *MUST*
|
||
fall within the appropriate representable range, or one of the records
|
||
`#"nan"()`, `#"+inf"()` or `#"-inf"()`.
|
||
|
||
### Anonymous Tuples and Unit
|
||
|
||
A `Tuple` is a `Record` with label `#"tuple"` and zero or more fields,
|
||
denoting an anonymous tuple of values.
|
||
|
||
The 0-ary tuple, `#"tuple"()`, denotes the empty tuple, sometimes
|
||
called "unit" or "void" (but *not* e.g. JavaScript's "undefined"
|
||
value).
|
||
|
||
### Booleans, Null and Undefined
|
||
|
||
The two 0-ary `Record`s `#"true"()` and `#"false"()` denote the "true"
|
||
and "false" Boolean values, respectively.
|
||
|
||
Tony Hoare's
|
||
"[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)"
|
||
can be represented with the 0-ary `Record` `#"null"()`. An "undefined"
|
||
value can be represented as `#"undefined"()`.
|
||
|
||
### Dates and Times
|
||
|
||
Dates, times, moments, and timestamps can be represented with a
|
||
`Record` with label `#"rfc3339"` having a single field, a `Text`,
|
||
which *MUST* conform to one of the `full-date`, `partial-time`,
|
||
`full-time`, or `date-time` productions of
|
||
[section 5.6 of RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6).
|
||
|
||
## Syntax
|
||
|
||
Now we have discussed `Value`s and their meanings, we may turn to
|
||
techniques for *representing* `Value`s for communication or storage.
|
||
|
||
The syntax we have used for the examples so far is inadequate in many
|
||
ways, not least of which is that it cannot represent every `Value`.
|
||
|
||
Separation of the meaning of a piece of syntax from the syntax itself
|
||
opens the door to domain-specific syntaxes, all equivalent and
|
||
interconvertible.[^asn1] With a robust semantic foundation,
|
||
connections to other data languages can also be made.
|
||
|
||
[^asn1]: Those who remember
|
||
[ASN.1](https://www.itu.int/en/ITU-T/asn1/Pages/introduction.aspx)
|
||
will recall BER, DER, PER, CER, XER and so on, each appropriate to
|
||
a different setting. Similarly,
|
||
[Rivest's S-Expression design][sexp.txt] offers a human-friendly
|
||
syntax, a syntax robust to network-induced message corruption, and
|
||
an unambiguous, simple and easily-parsed machine-friendly syntax
|
||
for the same underlying values.
|
||
|
||
### Transfer syntax: S-Expressions
|
||
|
||
For now, we limit our attention to an easily-parsed, easily-produced
|
||
machine-readable syntax by mapping our `Value`s to the canonical form
|
||
of [Rivest's S-Expressions][sexp.txt].[^why-not-spki-sexps]
|
||
|
||
[^why-not-spki-sexps]: Why not just use Rivest's S-Expressions as
|
||
they are? While they include binary data and sequences, and an
|
||
obvious equivalence for them exists, they lack numbers *per se* as
|
||
well as any kind of unordered structure such as sets or maps. In
|
||
addition, while "display hints" allow labelling of binary data
|
||
with an intended interpretation, they cannot be attached to any
|
||
other kind of structure, and the "hint" itself can only be a
|
||
binary blob.
|
||
|
||
#### Byte strings
|
||
|
||
`ByteString`s map to byte-string S-Expressions.
|
||
|
||
**Examples.**
|
||
- What we have been writing above as `#"ABC"` would be represented as
|
||
the S-Expression `3:ABC`.
|
||
- The empty `ByteString` is represented by the S-Expression `0:`.
|
||
|
||
#### Numbers
|
||
|
||
Numbers are the most complicated values to represent as an
|
||
S-Expression.
|
||
|
||
((TODO: Consider cutting complexity by e.g. representing a `Number` as
|
||
a sign bit, a little-endian blob of the integer part of the number,
|
||
and a little-endian blob of the fractional part of the number. Lots of
|
||
trailing/leading zeros for very large/small numbers!))
|
||
|
||
We represent `Number`s using a sign-magnitude format, where the
|
||
magnitude is written using a little-endian, twos-complement binary
|
||
[*significand*](https://en.wikipedia.org/wiki/Significand) and a
|
||
(signed) *shift amount*.
|
||
|
||
In essence, we use a generalized, variable-width form of binary IEEE
|
||
floating-point representation.
|
||
|
||
Let `N` be the `Number` to represent as an S-Expression.
|
||
|
||
The sign bit is 0 when `N` is zero or positive, and 1 when `N` is
|
||
negative.
|
||
|
||
The magnitude of `N` can be viewed as an infinite sequence of bits
|
||
with a fraction-separator mark placed somewhere in the sequence,
|
||
|
||
```
|
||
···00.000 b_0 b_1 ··· b_{k-1} b_k ··· b_{n-1} 000000···
|
||
···000000 b_0 b_1 ··· b_{k-1} . b_k ··· b_{n-1} 000000···
|
||
···000000 b_0 b_1 ··· b_{k-1} b_k ··· b_{n-1} 000.00···
|
||
```
|
||
|
||
where `b_0` is the leftmost (most significant) and `b_{n-1}` the
|
||
rightmost (least significant) non-zero bit.
|
||
|
||
Let `k`, the position of the fraction-separator mark, be `i` when it
|
||
is immediately to the left of `b_i` for some `i`, generalizing to
|
||
negative values when it is to the left of `b_0` and values greater
|
||
than `n-1` when it is to the right of `b_{n-1}`.
|
||
|
||
For example, `k` will be:
|
||
- 0 when the fraction-separator is immediately (i.e. zero bits) to the left of `b_0`;
|
||
- -3 (as in the first example above) when it is three bits left of `b_0`;
|
||
- `n` when it is immediately (i.e. zero bits) to the right of `b_{n-1}`;
|
||
- `n`+3 when it is three bits to the right of `b_{n-1}`.
|
||
|
||
The unpadded significand is `b_0 b_1 ··· b_{n-1}`.
|
||
|
||
When `k` < `n`, the shift `z`=`k-n` and the significand is:
|
||
- the unpadded significand,
|
||
- with the sign bit appended to it on the right, and then
|
||
- padded on the left with zeroes until it is a whole number of octets wide.
|
||
|
||
When `k` ≥ `n`, the shift `z`=`8×⌊(k-n)/8⌋` and the significand is:
|
||
- the unpadded significand,
|
||
- padded on the right with `(k-n) mod 8` zeroes,
|
||
- with the sign bit then appended on the right, and then
|
||
- padded on the left with zeroes until it is a whole number of octets wide.
|
||
|
||
Now, let `s`=2`z` if `z` is zero or positive, or `s`=2|`z`|+1 if `z`
|
||
is negative.
|
||
|
||
Finally, the S-Expression form of `N` is:
|
||
- `(4:*num [SIGNIFICAND] [SHIFT])`, if `s`≠0; or
|
||
- `(4:*num [SIGNIFICAND])`, if `s`=0 but the significand contains non-zero bits; or
|
||
- `(4:*num)`, if `s`=0 and the significand contains no non-zero bits;
|
||
|
||
where
|
||
- `[SIGNIFICAND]` stands for a byte-string S-Expression containing a little-endian representation of the significand, and
|
||
- `[SHIFT]` stands for a byte-string S-Expression containing a little-endian representation of `s`.
|
||
|
||
**Examples.** (Shown using the hexadecimal representation of
|
||
byte-strings from
|
||
[section 4.4 of Rivest's S-Expression specification][sexp.txt] in
|
||
places.)
|
||
- `N`=0 → `(4:*num)`
|
||
- `N`=1 → `(4:*num#02#)`
|
||
- `N`=-1 → `(4:*num#03#)`
|
||
- `N`=10₁₀=1010.0₂ → `n`=3, `k`=4, `z`=0, `s`=0 → `(4:*num#14#)`
|
||
- `N`=2560₁₀=101000000000.0₂ → `n`=3, `k`=12, `z`=8, `s`=16 → `(4:*num#14##10#)`
|
||
- `N`=-2560₁₀=-101000000000.0₂ → `n`=3, `k`=12, `z`=8, `s`=16 → `(4:*num#15##10#)`
|
||
- `N`=-6₁₀=-110.0₂ → `n`=2, `k`=3, `z`=0, `s`=0 → `(4:*num#0D#)`
|
||
- `N`=0.5₁₀=0.1₂ → `n`=1, `k`=0, `z`=-1, `s`=3 → `(4:*num#02##03#)`
|
||
- `N`=-3/2₁₀=-1.1₂ → `n`=2, `k`=1, `z`=-1, `s`=3 → `(4:*num#07##03#)`
|
||
- `N`=33/192₁₀=0.001011₂ → `n`=4, `k`=-2, `z`=-6, `s`=7 → `(4:*num#16##07#)`
|
||
- `N`=-1.202E4567=1011011001···000₂ (15172 binary digits, the last 4565 of which are zero) → `n`=10607, `k`=15172, `z`=4560, `s`=9120 → `(4:*num#41828E···24CD16##A023#)`
|
||
|
||
((TODO: figure out what this algorithm would actually look like in,
|
||
say, C, Python and Racket.))
|
||
|
||
#### Lists
|
||
|
||
A `List` maps to an S-Expression list of representations of its
|
||
elements, with the byte-string S-Expression `5:*list` prepended.
|
||
|
||
**Examples.**
|
||
- The `List` containing the `ByteString`s `#"a"`, `#"b"`, and `#"c"`
|
||
would be represented as the S-Expression `(5:*list1:a1:b1:c)`.
|
||
- The empty `List` is represented by the S-Expression `(5:*list)`.
|
||
|
||
#### Maps
|
||
|
||
A `Map` is represented by an S-Expression list of representations of
|
||
the `Map`'s key-value pairs, with the byte-string `4:*map` prepended.
|
||
|
||
Each key-value pair is represented by a two-element S-Expression list
|
||
containing representations of the key and the value, in that order.
|
||
|
||
The key-value pairs *MUST* be ordered by `Value`-order of their keys.
|
||
|
||
**Examples.**
|
||
- The `Map` containing entries mapping `#"a"` to `#"d"` and `#"c"` to
|
||
`#"b"` is represented by `(4:*map(1:a1:d)(1:c1:b))`.
|
||
- The `Map` containing an entry mapping the empty list to a "true"
|
||
Boolean value is represented by `(4:*map((5:*list)(4:true)))`.
|
||
- The empty `Map` is represented by `(4:*map)`.
|
||
|
||
**Non-examples.**
|
||
- The S-Expression `(4:*map(1:c1:b)(1:a1:d))` is invalid, because its
|
||
key-value pairs are not in `Value`-order by key: `#"c"` > `#"a"`.
|
||
- The S-Expression `(4:*map1:a1:d1:c1:b)` is invalid, because its
|
||
key-value pairs appear "flattened" in the outer list, rather than
|
||
each appearing in a two-element list of its own.
|
||
|
||
#### Records
|
||
|
||
A `Record` is represented by an S-Expression list of its fields,
|
||
prepended by:
|
||
|
||
- the representation of its label, if its label is a `ByteString` and
|
||
does not begin with byte 42 (ASCII "`*`"); or
|
||
- the S-Expression `1:*` followed by the representation of the
|
||
`Record`'s label, otherwise.
|
||
|
||
**Examples.**
|
||
- The `Text` `"hello-world"` is represented by the S-Expression
|
||
`(5:utf-811:hello-world)`.
|
||
- The `IRI` denoting `http://www.w3.org/` is represented by the
|
||
S-Expression `(3:iri(5:utf-818:http://www.w3.org/))`.
|
||
- The `Record` `#"*"()` is represented by the S-Expression
|
||
`(1:*1:*)`.
|
||
- The `Record` `#"*foo"(#"*bar")` is represented by the S-Expression
|
||
`(1:*4:*foo4:*bar)`.
|
||
- The `Record` with the empty list as its label and no fields is
|
||
represented by the S-Expression `(1:*(5:*list))`.
|
||
- `(7:rfc3339(5:utf-83:foo))` represents a well-formed `Value` that
|
||
is a `Record` with `#"rfc3339"` as its label, and a single `Text`
|
||
field. While it is a perfectly reasonable `Value`, it does *not*
|
||
represent a valid date or time, since the `Text` `"foo"` does not
|
||
conform to any of the RFC 3339 productions enumerated above.
|
||
|
||
**Non-examples.**
|
||
- `((5:*list))` is not a representation of the `Record` with the
|
||
empty list as its label and no fields, because that `Record` has a
|
||
non-`ByteString` as its label, mandating a `1:*` prefix on its
|
||
S-Expression representation.
|
||
- `(4:*foo4:*bar)` does not represent the `Record`
|
||
`#"*foo"(#"*bar")`, because the label `#"*foo"` begins with "`*`",
|
||
mandating a `1:*` prefix on the `Record`'s S-Expression
|
||
representation.
|
||
|
||
## Examples
|
||
|
||
((TODO: Give some examples of large and small SPKI-CAT documents,
|
||
perhaps translated from various JSON blobs floating around the
|
||
internet.))
|
||
|
||
## Representing Values in Programming Languages
|
||
|
||
We have given a definition of `Value` and its semantics, and proposed
|
||
a concrete syntax for communicating and storing `Value`s. We now turn
|
||
to **suggested** representations of `Value`s as *programming-language
|
||
values* for various programming languages.
|
||
|
||
### JavaScript
|
||
|
||
- `ByteString` ↔ `Uint8Array`
|
||
- `Number` ↔ numbers, problematically; bignums, perhaps; other?? TODO
|
||
- `List` ↔ `Array`
|
||
- `Map` ↔ `Object`
|
||
- `Record` ↔ an instance of something like `Record` below, unless the label is...
|
||
- `#"utf-8"` ↔ `String`
|
||
- `#"true"` ↔ `true`
|
||
- `#"false"` ↔ `false`
|
||
- `#"null"` ↔ `null`
|
||
- `#"undefined"` ↔ the undefined value
|
||
- `#"rfc3339"` ↔ `Date`, if the `Record`'s field matches the `date-time` RFC 3339 production
|
||
|
||
```javascript
|
||
function Record(label, ...fields) {
|
||
this.label = label;
|
||
this.fields = fields;
|
||
}
|
||
```
|
||
|
||
### Scheme/Racket
|
||
|
||
- `ByteString` ↔ byte vector (Racket: "Bytes")
|
||
- `Number` ↔ numbers
|
||
- `List` ↔ (where possible, immutable) list
|
||
- `Map` ↔ hash-table
|
||
- `Record` ↔ a structure (Racket: a "prefab struct"), unless the label is...
|
||
- `#"utf-8"` ↔ a string
|
||
- `#"true"` ↔ `#t`
|
||
- `#"false"` ↔ `#f`
|
||
- `#"symbol"` ↔ a symbol
|
||
|
||
### Java
|
||
|
||
- `ByteString` ↔ `byte[]`
|
||
- `Number` ↔ numbers, problematically; bignums, perhaps; other?? TODO
|
||
- `List` ↔ `java.util.List`
|
||
- `Map` ↔ `java.util.Map`
|
||
- `Record` ↔ an instance of something like `Record` below, unless the label is...
|
||
- `#"utf-8"` ↔ `java.lang.String`
|
||
- `#"true"` ↔ `java.lang.Boolean.TRUE`
|
||
- `#"false"` ↔ `java.lang.Boolean.FALSE`
|
||
- `#"null"` ↔ a special singleton object, but *not* Java's `null`
|
||
- `#"rfc3339"` ↔ `java.util.{Date,Time,Timestamp}`, according to which RFC 3339 production the `Record`'s field matches
|
||
|
||
### Erlang
|
||
|
||
- `ByteString` ↔ a binary
|
||
- `Number` ↔ numbers, probably; TODO
|
||
- `List` ↔ a list
|
||
- `Map` ↔ a [map](http://erlang.org/doc/reference_manual/data_types.html#id77432) (new in Erlang/OTP R17)
|
||
- `Record` ↔ a tuple with the label in the first position, and the fields in subsequent positions, unless the label is...
|
||
- `#"true"` ↔ `true`
|
||
- `#"false"` ↔ `false`
|
||
- `#"null"` ↔ `null`
|
||
- `#"undefined"` ↔ `undefined`
|
||
- `#"symbol"` ↔ the `Text` field converted to an Erlang atom, if
|
||
some kind of an "unsafe" mode is set on the decoder (because
|
||
Erlang atoms are not GC'd); otherwise like any other kind of
|
||
`Record`
|
||
|
||
---
|