Split out preserves into its own repository

This commit is contained in:
Tony Garnock-Jones 2018-09-29 17:22:34 +01:00
parent 2c03883b00
commit d497d9e6d1
8 changed files with 1 additions and 3549 deletions

View File

@ -1,2 +0,0 @@
preserve.pdf: preserve.md preserve.css
google-chrome --headless --disable-gpu --print-to-pdf=$@ http://localhost:4000/preserve.html

View File

@ -1,662 +0,0 @@
---
---
<style>
body { font-size: 120%; margin-left: 2rem; }
h1, h2, h3, h4, h5, h6 { margin-left: -1rem; }
h2 { border-bottom: solid black 1px; }
</style>
2018-06-04 20:31:48 TODO: cwebber's email comments
2018-06-04 20:31:51 TODO: look at https://github.com/imbal/rson and at clojure EDN
2018-06-05 10:32:02 ... and at http://json-schema.org/latest/json-schema-core.html#rfc.section.4.2
# SPKI CAT: SPKI S-Expressions with Canonical Atom Tags
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
Christopher Lemmer Webber <cwebber@dustycloud.org>
May 2018
Version 0.0.1
________________
/ \
/\__/\ / boo! \
/ \ \ i'm very spki! \
\=\_^__^_/= ___/\__________________/
|/ \
\\ | | /
<_|--|_>
## Introduction
This document proposes a language-neutral JSON-like *data type*, along
with a robust equivalence relation ("semantics") and a total ordering
over inhabitants of the type.[^tjson]
[^tjson]: [TJSON](https://www.tjson.org/) has a similar aim:
“different on-the-wire representations of an object correspond to
the same typed data object”
([source](https://news.ycombinator.com/item?id=12860143)); “TJSON
is defined as a serialization format on top of a JSON-like data
model” ([source](https://news.ycombinator.com/item?id=12860401)).
It then suggests conventions for encoding common data formats in terms
of the proposed data type.
Finally, it proposes concrete *syntax* for the data type, offering a
language-neutral transfer syntax (based on
[Rivest's S-Expressions][sexp.txt] as used in [SPKI/SDSI][spki]) and
suggesting possible language-specific representations for the data
type's inhabitants.
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
[spki]: http://world.std.com/~cme/html/spki.html
### Why not Just Use JSON?
<!-- JSON lacks semantics: JSON syntax doesn't denote anything -->
JSON offers *syntax* for numbers, strings, booleans, null, arrays and
string-keyed maps. However, it offers no *semantics* for the syntax:
it is left to each implementation to determine how to treat each JSON
term. This causes
[interoperability](http://seriot.ch/parsing_json.php) and even
[security](http://seriot.ch/parsing_json.php) issues.
Specifically, JSON does not:
- assign any meaning to numbers,[^meaning-ieee-double]
- determine how strings are to be compared,[^string-key-comparison]
- determine whether object key ordering is significant, or
- determine whether duplicate object keys are permitted, what it
would mean if they were, or how to determine a duplicate in the
first place.
In short, JSON syntax doesn't *denote* anything.[^xml-infoset] [^other-formats]
[^meaning-ieee-double]:
[Section 6 of RFC 7159](https://tools.ietf.org/html/rfc7159#section-6)
does go so far as to indicate “good interoperability can be
achieved” by imagining that parsers are able reliably to
understand the syntax of numbers as denoting an IEEE 754
double-precision floating-point value.
[^string-key-comparison]:
[Section 8.3 of RFC 7159](https://tools.ietf.org/html/rfc7159#section-8.3)
suggests that *if* an implementation compares strings used as
object keys “code unit by code unit”, then it will interoperate
with *other such implementations*, but neither requires this
behaviour nor discusses comparisons of strings used in other
contexts.
[^xml-infoset]: The XML world has the concept of
[XML infoset](https://www.w3.org/TR/xml-infoset/). Loosely
speaking, XML infoset is the *denotation* of an XML document; the
*meaning* of the document.
[^other-formats]: Most other recent data languages are like JSON in
specifying only a syntax with no associated semantics. While some
do make a sketch of a semantics, the result is often
underspecified (e.g. in terms of how strings are to be compared),
overly machine-oriented (e.g. treating 32-bit integers as
fundamentally distinct from 64-bit integers and from
floating-point numbers), overly fine (e.g. giving visibility to
the order in which map entries are written), or all three.
Some examples:
- are the JSON values `1`, `1.0`, and `1e0` the same or different?
- are the JSON values `1.0` and `1.0000000000000001` the same or different?
- are the JSON strings `"päron"` (UTF-8 `70c3a4726f6e`) and `"päron"`
(UTF-8 `7061cc88726f6e`) the same or different?
- are the JSON objects `{"a":1, "b":2}` and `{"b":2, "a":1}` the same
or different?
- which, if any, of `{"a":1, "a":2}`, `{"a":1}` and `{"a":2}` are the
same? Are all three legal?
- are `{"päron":1}` and `{"päron":1}` the same or different?
Different JSON implementations give different answers to these
questions. The JSON specifications are silent on these questions.
There are other minor problems with JSON having to do with its syntax.
Examples include its relative verbosity and its lack of support for
binary data.
## Starting with Semantics
Taking inspiration from functional programming, we start with a
definition of the *values* that we want to work with and give them
meaning independent of their syntax. We will treat syntax separately,
later in this document.
We will want our data type to accommodate *atoms* (numbers and text),
*products* (both tuples and sequences), and *labelled
sums*.[^zephyr-asdl] It should also include *keyed maps*. We should
avoid unnecessary restrictions such as machine-oriented fixed-width
integer or floating-point values where possible.
[^zephyr-asdl]: This design was loosely inspired by Zephyr ASDL (h/t
[Darius Bacon](https://twitter.com/abecedarius/status/993545767884226561)),
which doesn't offer much in the way of atoms, but offers
general-purpose labelled sums and products. See D. C. Wang, A. W.
Appel, J. L. Korn, and C. S. Serra, “The Zephyr Abstract Syntax
Description Language,” in USENIX Conference on Domain-Specific
Languages, 1997, pp. 213228.
[PDF available.](https://www.usenix.org/legacy/publications/library/proceedings/dsl97/full_papers/wang/wang.pdf)
### Values
A `Value` is one of:
- a `ByteString` for general-purpose non-numeric atomic data,[^byte-string-rationale]
- a `Number` for integers and rational numbers,
- a `List` for general-purpose variable-length sequences,
- a `Map` for general-purpose variable-size key/value maps, or
- a `Record` for tagging a tuple of values with an intended interpretation.
We define a total order over `Value`s: Every `ByteString` is less than
the other kinds of `Value`; every `Number` is less than any `List`,
`Map` or `Record`, but greater than any `ByteString`; and so on.
That is, `ByteString < Number < List < Map < Record`.
Two values of the same kind are compared using kind-specific rules,
given below.
Two `Value`s are equal if neither is less than the other according to
the total order.
[^byte-string-rationale]: Why include `ByteString`, when we could
instead use a reserved `Record` along with a `List` of `Number`s?
((TODO: Actually decide about this! Similarly, why include `Map`
rather than a restricted form of `List` with a `Record`? I think
the answer has to do with the arbitrariness of the label we'd
pick: unless *extremely* carefully chosen (i.e. number 0 (ideally
even `-Inf`!) for byte strings, number 1 for map, and have the
order go `Number < Record < List`), they would mess up the
prettiness of the ordering. Though... we could ultimately reduce
this to `Number` and `Record`, and have a family of `#"list"` and
`#"map"` `Record`s...))
### Byte strings
A `ByteString` is an ordered sequence of zero or more integers in the
inclusive range [0..255].
`ByteString`s are compared lexicographically.
We will write examples of `ByteString`s that contain only ASCII
characters using “`#"`” as an opening quote mark and “`"`” as a
closing quote mark.
**Examples.** The `ByteString` containing the three ASCII characters
`A`, `B` and `C` is written as `#"ABC"`. The empty `ByteString` is
written as `#""`. **N.B.** Despite appearances, these are *binary*
data.
### Numbers
A `Number` is a signed rational number of finite precision whose
magnitude can be exactly represented in base two with a finite number
of digits. This includes integers of arbitrary width as well as (for
example) the non-infinite non-NaN IEEE 754 floating-point values.
`Number`s are compared as mathematical numbers.
We will write examples of `Number`s using standard mathematical
notation.
**Examples.** 10, -6, 0.5, -3/2, 33/192, -1.202E4567.
**Non-examples.** NaN (the clue is in the name!), ∞ (not finite),
0.2 (cannot be exactly represented with a finite number of binary
digits), 1/7 (likewise), 2+*i*3 (not rational), √2 (likewise).
### Lists
A `List` is an ordered sequence of zero or more `Value`s.
`List`s are compared lexicographically, appealing to the ordering on
`Value`s for comparisons at each position in the `List`s.
### Maps
A `Map` is an *unordered* collection of zero or more pairs of
`Value`s. Each pair comprises a *key* and a *value*. Keys in a `Map`
must be pairwise distinct.
Instances of `Map` are compared by lexicographic comparison of the
sequences resulting from ordering each `Map`'s pairs in ascending
order by key. ((TODO: Is this a good idea? Is it clearly-enough
written? An alternative approach is to compare first by the *count* of
pairs, and only if the count is the same, start comparing the pairs
themselves.))
### Records
A `Record` is a tuple of one or more `Value`s. The first in the tuple
is called the *label* of the `Record`, and the other elements of the
tuple are called its *fields*.
`Record` labels are *usually* `ByteString`s, but can be any kind of
`Value`.[^iri-labels]
[^iri-labels]: It is occasionally (but seldom) necessary to
interpret such `ByteString` labels as UTF-8 encoded IRIs. Where a
label can be read as a relative IRI, it is notionally interpreted
with respect to the IRI `http://spki-cat.org/` ((TODO:
placeholder)); where a label can be read as an absolute IRI, it
stands for that IRI; and otherwise, it cannot be read as an IRI at
all, and so the label simply stands for itself - for its own
`Value`.
`Record`s are compared lexicographically as if they were just tuples;
that is, first by their labels, and then by the remainder of their
fields.
We will write examples of `Record`s with `ByteString` labels entirely
composed of ASCII characters as their label followed by their
parenthesised, comma-separated fields.
**Examples.** The `Record` with label `#"foo"` and fields 1, 2 and 3
is written `#"foo"(1, 2, 3)`; the `Record` with label `#"void"` and no
fields is written `#"void"()`.
## Conventions for Common Data Types
The `Value` data type is essentially an abstract S-Expression, able to
represent semi-structured data over `ByteString` and `Number` atoms.
However, users need a wide variety of data types for representing
domain-specific values such as text, calendrical values, machine
words, IEEE 754 floating-point values, booleans, and so on.
We use appropriately-labelled `Record`s to denote these
domain-specific data types.
All of these conventions are optional. They form a layer atop the core
`Value` structure. Non-domain-specific tools do not in general need to
treat them specially.
**Validity.** Many of the labels we will describe in this section come
with side-conditions on the contents of labelled `Record`s. It is
possible to construct an instance of `Value` that violates these
side-conditions without ceasing to be a `Value` or becoming
unrepresentable. However, we say that such a `Value` is *invalid*
because it fails to honour the necessary side-conditions.
Implementations *SHOULD* allow two modes of working: one which
treats all `Value`s identically, without regard for side-conditions,
and one which enforces validity (i.e. side-conditions) when reading,
writing, or constructing `Value`s.
### Text
A `Text` is a `Record` labelled with the `ByteString` `#"utf-8"` and
having a single field that is also a `ByteString`. The field *MUST* be
valid UTF-8.
We will write examples of `Text`s that contain Unicode text using
“`"`” as both an opening and closing quote mark.
**Examples.** The `Text` containing the three Unicode code points `z`
(0x7A), `水` (0x6C34) and `𝄞` (0x1D11E) is written as `"z水𝄞"`.
**Normalization forms.** Unicode defines multiple
[normalization forms](http://unicode.org/reports/tr15/) for text. The
ordering and equivalence relations defined for `Value`s mean that, for
Unicode text, the UTF-8 encoded byte-level form of a text is used in
comparisons.[^utf8-is-awesome] In order for users to unambiguously
signal or require a particular normalization form, we define a
`NormalizedText`, which is a `Record` labelled with
`#"unicode-normalization"` and having two fields, the first of which
is a `Text` specifying the normalization form used (e.g. `"nfc"`,
`"nfd"`, `"nfkc"`, `"nfkd"`), and the second of which is a `Text`
whose underlying representation *MUST* be normalized according to the
named normalization form.
[^utf8-is-awesome]: Happily, the design of UTF-8 is such that this
gives the same result as a lexicographic code-point-by-code-point
comparison!
**IRIs.** (URIs, URLs, URNs, etc.) An `IRI` is a `Record` labelled
with `#"iri"` and having one field, a `Text` which is the IRI itself
and which *MUST* be a valid absolute or relative IRI.
**Symbols.** Programming languages like Lisp and Prolog frequently use
string-like values called *symbols*. A `Symbol` is a `Record` labelled
with `#"symbol"` and having one field, a `Text`.
### Numbers
The definition of `Number` captures all integers and all
finitely-representable floating-point values. However, in certain
circumstances it can be valuable to assert that a number inhabits a
particular range, such as a fixed-width machine word or an IEEE 754
floating-point value.
**Fixed-width machine words.** (16-, 32- and 64-bit) A family of
labels `i`*n* and `u`*n* for *n* ∈ {16,32,64} denote *n*-bit-wide
signed and unsigned range restrictions, respectively. Records with
these labels *MUST* have one field, a `Number`, which *MUST* fall
within the appropriate range. That is, to be valid,
- in `#"i16"(`*x*`)`, -32768 <= *x* <= 32767, and ⌊*x*⌋ = *x*.
- in `#"u16"(`*x*`)`, 0 <= *x* <= 65535, and ⌊*x*⌋ = *x*.
- in `#"i32"(`*x*`)`, -2147483648 <= *x* <= 2147483647, and ⌊*x*⌋ = *x*.
- etc.
**IEEE 754 floating-point.** (single- and double-precision) The labels
`f32` and `f64` denote single- and double-precision IEEE 754
floating-point values, respectively. Records with these labels *MUST*
have one field. This field *MUST* either be a `Number`, which *MUST*
fall within the appropriate representable range, or one of the records
`#"nan"()`, `#"+inf"()` or `#"-inf"()`.
### Anonymous Tuples and Unit
A `Tuple` is a `Record` with label `#"tuple"` and zero or more fields,
denoting an anonymous tuple of values.
The 0-ary tuple, `#"tuple"()`, denotes the empty tuple, sometimes
called "unit" or "void" (but *not* e.g. JavaScript's "undefined"
value).
### Booleans, Null and Undefined
The two 0-ary `Record`s `#"true"()` and `#"false"()` denote the "true"
and "false" Boolean values, respectively.
Tony Hoare's
"[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)"
can be represented with the 0-ary `Record` `#"null"()`. An "undefined"
value can be represented as `#"undefined"()`.
### Dates and Times
Dates, times, moments, and timestamps can be represented with a
`Record` with label `#"rfc3339"` having a single field, a `Text`,
which *MUST* conform to one of the `full-date`, `partial-time`,
`full-time`, or `date-time` productions of
[section 5.6 of RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6).
## Syntax
Now we have discussed `Value`s and their meanings, we may turn to
techniques for *representing* `Value`s for communication or storage.
The syntax we have used for the examples so far is inadequate in many
ways, not least of which is that it cannot represent every `Value`.
Separation of the meaning of a piece of syntax from the syntax itself
opens the door to domain-specific syntaxes, all equivalent and
interconvertible.[^asn1] With a robust semantic foundation,
connections to other data languages can also be made.
[^asn1]: Those who remember
[ASN.1](https://www.itu.int/en/ITU-T/asn1/Pages/introduction.aspx)
will recall BER, DER, PER, CER, XER and so on, each appropriate to
a different setting. Similarly,
[Rivest's S-Expression design][sexp.txt] offers a human-friendly
syntax, a syntax robust to network-induced message corruption, and
an unambiguous, simple and easily-parsed machine-friendly syntax
for the same underlying values.
### Transfer syntax: S-Expressions
For now, we limit our attention to an easily-parsed, easily-produced
machine-readable syntax by mapping our `Value`s to the canonical form
of [Rivest's S-Expressions][sexp.txt].[^why-not-spki-sexps]
[^why-not-spki-sexps]: Why not just use Rivest's S-Expressions as
they are? While they include binary data and sequences, and an
obvious equivalence for them exists, they lack numbers *per se* as
well as any kind of unordered structure such as sets or maps. In
addition, while "display hints" allow labelling of binary data
with an intended interpretation, they cannot be attached to any
other kind of structure, and the "hint" itself can only be a
binary blob.
#### Byte strings
`ByteString`s map to byte-string S-Expressions.
**Examples.**
- What we have been writing above as `#"ABC"` would be represented as
the S-Expression `3:ABC`.
- The empty `ByteString` is represented by the S-Expression `0:`.
#### Numbers
Numbers are the most complicated values to represent as an
S-Expression.
((TODO: Consider cutting complexity by e.g. representing a `Number` as
a sign bit, a little-endian blob of the integer part of the number,
and a little-endian blob of the fractional part of the number. Lots of
trailing/leading zeros for very large/small numbers!))
We represent `Number`s using a sign-magnitude format, where the
magnitude is written using a little-endian, twos-complement binary
[*significand*](https://en.wikipedia.org/wiki/Significand) and a
(signed) *shift amount*.
In essence, we use a generalized, variable-width form of binary IEEE
floating-point representation.
Let `N` be the `Number` to represent as an S-Expression.
The sign bit is 0 when `N` is zero or positive, and 1 when `N` is
negative.
The magnitude of `N` can be viewed as an infinite sequence of bits
with a fraction-separator mark placed somewhere in the sequence,
```
···00.000 b_0 b_1 ··· b_{k-1} b_k ··· b_{n-1} 000000···
···000000 b_0 b_1 ··· b_{k-1} . b_k ··· b_{n-1} 000000···
···000000 b_0 b_1 ··· b_{k-1} b_k ··· b_{n-1} 000.00···
```
where `b_0` is the leftmost (most significant) and `b_{n-1}` the
rightmost (least significant) non-zero bit.
Let `k`, the position of the fraction-separator mark, be `i` when it
is immediately to the left of `b_i` for some `i`, generalizing to
negative values when it is to the left of `b_0` and values greater
than `n-1` when it is to the right of `b_{n-1}`.
For example, `k` will be:
- 0 when the fraction-separator is immediately (i.e. zero bits) to the left of `b_0`;
- -3 (as in the first example above) when it is three bits left of `b_0`;
- `n` when it is immediately (i.e. zero bits) to the right of `b_{n-1}`;
- `n`+3 when it is three bits to the right of `b_{n-1}`.
The unpadded significand is `b_0 b_1 ··· b_{n-1}`.
When `k` < `n`, the shift `z`=`k-n` and the significand is:
- the unpadded significand,
- with the sign bit appended to it on the right, and then
- padded on the left with zeroes until it is a whole number of octets wide.
When `k``n`, the shift `z`=`8×⌊(k-n)/8⌋` and the significand is:
- the unpadded significand,
- padded on the right with `(k-n) mod 8` zeroes,
- with the sign bit then appended on the right, and then
- padded on the left with zeroes until it is a whole number of octets wide.
Now, let `s`=2`z` if `z` is zero or positive, or `s`=2|`z`|+1 if `z`
is negative.
Finally, the S-Expression form of `N` is:
- `(4:*num [SIGNIFICAND] [SHIFT])`, if `s`≠0; or
- `(4:*num [SIGNIFICAND])`, if `s`=0 but the significand contains non-zero bits; or
- `(4:*num)`, if `s`=0 and the significand contains no non-zero bits;
where
- `[SIGNIFICAND]` stands for a byte-string S-Expression containing a little-endian representation of the significand, and
- `[SHIFT]` stands for a byte-string S-Expression containing a little-endian representation of `s`.
**Examples.** (Shown using the hexadecimal representation of
byte-strings from
[section 4.4 of Rivest's S-Expression specification][sexp.txt] in
places.)
- `N`=0 → `(4:*num)`
- `N`=1 → `(4:*num#02#)`
- `N`=-1 → `(4:*num#03#)`
- `N`=10₁₀=1010.0₂ → `n`=3, `k`=4, `z`=0, `s`=0 → `(4:*num#14#)`
- `N`=2560₁₀=101000000000.0₂ → `n`=3, `k`=12, `z`=8, `s`=16 → `(4:*num#14##10#)`
- `N`=-2560₁₀=-101000000000.0₂ → `n`=3, `k`=12, `z`=8, `s`=16 → `(4:*num#15##10#)`
- `N`=-6₁₀=-110.0₂ → `n`=2, `k`=3, `z`=0, `s`=0 → `(4:*num#0D#)`
- `N`=0.5₁₀=0.1₂ → `n`=1, `k`=0, `z`=-1, `s`=3 → `(4:*num#02##03#)`
- `N`=-3/2₁₀=-1.1₂ → `n`=2, `k`=1, `z`=-1, `s`=3 → `(4:*num#07##03#)`
- `N`=33/192₁₀=0.001011₂ → `n`=4, `k`=-2, `z`=-6, `s`=7 → `(4:*num#16##07#)`
- `N`=-1.202E4567=1011011001···000₂ (15172 binary digits, the last 4565 of which are zero) → `n`=10607, `k`=15172, `z`=4560, `s`=9120 → `(4:*num#41828E···24CD16##A023#)`
((TODO: figure out what this algorithm would actually look like in,
say, C, Python and Racket.))
#### Lists
A `List` maps to an S-Expression list of representations of its
elements, with the byte-string S-Expression `5:*list` prepended.
**Examples.**
- The `List` containing the `ByteString`s `#"a"`, `#"b"`, and `#"c"`
would be represented as the S-Expression `(5:*list1:a1:b1:c)`.
- The empty `List` is represented by the S-Expression `(5:*list)`.
#### Maps
A `Map` is represented by an S-Expression list of representations of
the `Map`'s key-value pairs, with the byte-string `4:*map` prepended.
Each key-value pair is represented by a two-element S-Expression list
containing representations of the key and the value, in that order.
The key-value pairs *MUST* be ordered by `Value`-order of their keys.
**Examples.**
- The `Map` containing entries mapping `#"a"` to `#"d"` and `#"c"` to
`#"b"` is represented by `(4:*map(1:a1:d)(1:c1:b))`.
- The `Map` containing an entry mapping the empty list to a "true"
Boolean value is represented by `(4:*map((5:*list)(4:true)))`.
- The empty `Map` is represented by `(4:*map)`.
**Non-examples.**
- The S-Expression `(4:*map(1:c1:b)(1:a1:d))` is invalid, because its
key-value pairs are not in `Value`-order by key: `#"c"` > `#"a"`.
- The S-Expression `(4:*map1:a1:d1:c1:b)` is invalid, because its
key-value pairs appear "flattened" in the outer list, rather than
each appearing in a two-element list of its own.
#### Records
A `Record` is represented by an S-Expression list of its fields,
prepended by:
- the representation of its label, if its label is a `ByteString` and
does not begin with byte 42 (ASCII "`*`"); or
- the S-Expression `1:*` followed by the representation of the
`Record`'s label, otherwise.
**Examples.**
- The `Text` `"hello-world"` is represented by the S-Expression
`(5:utf-811:hello-world)`.
- The `IRI` denoting `http://www.w3.org/` is represented by the
S-Expression `(3:iri(5:utf-818:http://www.w3.org/))`.
- The `Record` `#"*"()` is represented by the S-Expression
`(1:*1:*)`.
- The `Record` `#"*foo"(#"*bar")` is represented by the S-Expression
`(1:*4:*foo4:*bar)`.
- The `Record` with the empty list as its label and no fields is
represented by the S-Expression `(1:*(5:*list))`.
- `(7:rfc3339(5:utf-83:foo))` represents a well-formed `Value` that
is a `Record` with `#"rfc3339"` as its label, and a single `Text`
field. While it is a perfectly reasonable `Value`, it does *not*
represent a valid date or time, since the `Text` `"foo"` does not
conform to any of the RFC 3339 productions enumerated above.
**Non-examples.**
- `((5:*list))` is not a representation of the `Record` with the
empty list as its label and no fields, because that `Record` has a
non-`ByteString` as its label, mandating a `1:*` prefix on its
S-Expression representation.
- `(4:*foo4:*bar)` does not represent the `Record`
`#"*foo"(#"*bar")`, because the label `#"*foo"` begins with "`*`",
mandating a `1:*` prefix on the `Record`'s S-Expression
representation.
## Examples
((TODO: Give some examples of large and small SPKI-CAT documents,
perhaps translated from various JSON blobs floating around the
internet.))
## Representing Values in Programming Languages
We have given a definition of `Value` and its semantics, and proposed
a concrete syntax for communicating and storing `Value`s. We now turn
to **suggested** representations of `Value`s as *programming-language
values* for various programming languages.
### JavaScript
- `ByteString``Uint8Array`
- `Number` ↔ numbers, problematically; bignums, perhaps; other?? TODO
- `List``Array`
- `Map``Object`
- `Record` ↔ an instance of something like `Record` below, unless the label is...
- `#"utf-8"``String`
- `#"true"``true`
- `#"false"``false`
- `#"null"``null`
- `#"undefined"` ↔ the undefined value
- `#"rfc3339"``Date`, if the `Record`'s field matches the `date-time` RFC 3339 production
```javascript
function Record(label, ...fields) {
this.label = label;
this.fields = fields;
}
```
### Scheme/Racket
- `ByteString` ↔ byte vector (Racket: "Bytes")
- `Number` ↔ numbers
- `List` ↔ (where possible, immutable) list
- `Map` ↔ hash-table
- `Record` ↔ a structure (Racket: a "prefab struct"), unless the label is...
- `#"utf-8"` ↔ a string
- `#"true"``#t`
- `#"false"``#f`
- `#"symbol"` ↔ a symbol
### Java
- `ByteString``byte[]`
- `Number` ↔ numbers, problematically; bignums, perhaps; other?? TODO
- `List``java.util.List`
- `Map``java.util.Map`
- `Record` ↔ an instance of something like `Record` below, unless the label is...
- `#"utf-8"``java.lang.String`
- `#"true"``java.lang.Boolean.TRUE`
- `#"false"``java.lang.Boolean.FALSE`
- `#"null"` ↔ a special singleton object, but *not* Java's `null`
- `#"rfc3339"``java.util.{Date,Time,Timestamp}`, according to which RFC 3339 production the `Record`'s field matches
### Erlang
- `ByteString` ↔ a binary
- `Number` ↔ numbers, probably; TODO
- `List` ↔ a list
- `Map` ↔ a [map](http://erlang.org/doc/reference_manual/data_types.html#id77432) (new in Erlang/OTP R17)
- `Record` ↔ a tuple with the label in the first position, and the fields in subsequent positions, unless the label is...
- `#"true"``true`
- `#"false"``false`
- `#"null"``null`
- `#"undefined"``undefined`
- `#"symbol"` ↔ the `Text` field converted to an Erlang atom, if
some kind of an "unsafe" mode is set on the decoder (because
Erlang atoms are not GC'd); otherwise like any other kind of
`Record`
---

View File

@ -1,61 +0,0 @@
body {
font-family: palatino, "Palatino Linotype", "Palatino LT STD", "URW Palladio L", "TeX Gyre Pagella", serif;
}
@media screen {
body { padding-top: 2rem; max-width: 40em; margin: auto; font-size: 120%; }
hr { display: none; }
}
@media print {
@page { size: letter; margin: 4rem 0rem 4.333rem 0rem; }
body { margin-left: 4.5rem; margin-right: 4.5rem; }
h1, h2 { page-break-before: always; margin-top: 0; }
h1:first-of-type, h2:first-of-type { page-break-before: auto; }
hr+* { page-break-before: always; margin-top: 0; }
hr { display: none; }
}
h1, h2, h3, h4, h5, h6 { color: #4f81bd; }
h2 { border-bottom: solid #4f81bd 1px; }
pre, code { background-color: #eee; font-family: "DejaVu Sans Mono", monospace; }
code { font-size: 75%; }
pre { padding: 0.33rem; }
body {
counter-reset: section 0 subsection 0 appendix 0;
}
h2:before, h3:before {
text-align: right;
display: inline-block;
position: relative;
right: 2.33em;
font-size: 75%;
text-align: right;
width: 2em;
margin-right: -2em;
height: 0;
}
h2:before {
counter-increment: section;
content: counter(section) ". ";
}
h2 {
counter-reset: subsection 0;
}
h3:before {
counter-increment: subsection;
content: counter(section) "." counter(subsection) ". ";
}
h2[id^="appendix-"]:before {
counter-increment: appendix;
content: counter(appendix,upper-latin) ". ";
}
h2[id^="appendix-"] ~ h3:before {
counter-increment: subsection;
content: counter(appendix,upper-latin) "." counter(subsection) ". ";
}
h2#notes:before {
display: none;
}
.footnotes > ol { padding: 0; font-size: 90%; }

File diff suppressed because it is too large Load Diff

View File

@ -1,385 +0,0 @@
import sys
import numbers
import struct
try:
basestring
except NameError:
basestring = str
if isinstance(chr(123), bytes):
_ord = ord
else:
_ord = lambda x: x
class Float(object):
def __init__(self, value):
self.value = value
def __eq__(self, other):
if other.__class__ is self.__class__:
return self.value == other.value
def __repr__(self):
return 'Float(' + repr(self.value) + ')'
def __preserve_on__(self, encoder):
encoder.leadbyte(0, 0, 2)
encoder.buffer.extend(struct.pack('>f', self.value))
class Symbol(object):
def __init__(self, name):
self.name = name
def __eq__(self, other):
return isinstance(other, Symbol) and self.name == other.name
def __hash__(self):
return hash(self.name)
def __repr__(self):
return '#' + self.name
def __preserve_on__(self, encoder):
bs = self.name.encode('utf-8')
encoder.header(1, 3, len(bs))
encoder.buffer.extend(bs)
class Record(object):
def __init__(self, key, fields):
self.key = key
self.fields = tuple(fields)
self.__hash = None
def __eq__(self, other):
return isinstance(other, Record) and (self.key, self.fields) == (other.key, other.fields)
def __hash__(self):
if self.__hash is None:
self.__hash = hash((self.key, self.fields))
return self.__hash
def __repr__(self):
return str(self.key) + '(' + ', '.join((repr(f) for f in self.fields)) + ')'
def __preserve_on__(self, encoder):
try:
index = encoder.shortForms.index(self.key)
except ValueError:
index = None
if index is None:
encoder.header(2, 3, len(self.fields) + 1)
encoder.append(self.key)
else:
encoder.header(2, index, len(self.fields))
for f in self.fields:
encoder.append(f)
# Blub blub blub
class ImmutableDict(dict):
def __init__(self, *args, **kwargs):
if hasattr(self, '__hash'): raise TypeError('Immutable')
super(ImmutableDict, self).__init__(*args, **kwargs)
self.__hash = None
def __delitem__(self, key): raise TypeError('Immutable')
def __setitem__(self, key, val): raise TypeError('Immutable')
def clear(self): raise TypeError('Immutable')
def pop(self, k, d=None): raise TypeError('Immutable')
def popitem(self): raise TypeError('Immutable')
def setdefault(self, k, d=None): raise TypeError('Immutable')
def update(self, e, **f): raise TypeError('Immutable')
def __hash__(self):
if self.__hash is None:
h = 0
for k in self:
h = ((h << 5) ^ (hash(k) << 2) ^ hash(self[k])) & sys.maxsize
self.__hash = h
return self.__hash
@staticmethod
def from_kvs(kvs):
i = iter(kvs)
result = ImmutableDict()
result_proxy = super(ImmutableDict, result)
try:
while True:
k = next(i)
v = next(i)
result_proxy.__setitem__(k, v)
except StopIteration:
pass
return result
def dict_kvs(d):
for k in d:
yield k
yield d[k]
class DecodeError(ValueError): pass
class EncodeError(ValueError): pass
class Codec(object):
def __init__(self):
self.shortForms = [Symbol(u'discard'), Symbol(u'capture'), Symbol(u'observe')]
def set_shortform(self, index, v):
if index >= 0 and index < 3:
self.shortForms[index] = v
else:
raise ValueError('Invalid short form index %r' % (index,))
class Stream(object):
def __init__(self, iterator):
self._iterator = iterator
def __preserve_on__(self, encoder):
arg = (self.major << 2) | self.minor
encoder.leadbyte(0, 2, arg)
self._emit(encoder)
encoder.leadbyte(0, 3, arg)
def _emit(self, encoder):
raise NotImplementedError('Should be implemented in subclasses')
class ValueStream(Stream):
major = 3
def _emit(self, encoder):
for v in self._iterator:
encoder.append(v)
class SequenceStream(ValueStream):
minor = 0
class SetStream(ValueStream):
minor = 1
class DictStream(ValueStream):
minor = 2
def _emit(self, encoder):
for (k, v) in self._iterator:
encoder.append(k)
encoder.append(v)
class BinaryStream(Stream):
major = 1
minor = 2
def _emit(self, encoder):
for chunk in self._iterator:
if not isinstance(chunk, bytes):
raise EncodeError('Illegal chunk in BinaryStream %r' % (chunk,))
encoder.append(chunk)
class StringStream(BinaryStream):
minor = 1
class SymbolStream(BinaryStream):
minor = 3
class Decoder(Codec):
def __init__(self, packet):
super(Decoder, self).__init__()
self.packet = packet
self.index = 0
def peekbyte(self):
if self.index < len(self.packet):
return _ord(self.packet[self.index])
else:
raise DecodeError('Short packet')
def advance(self, count=1):
start = self.index
self.index = self.index + count
return start
def nextbyte(self):
val = self.peekbyte()
self.advance()
return val
def wirelength(self, arg):
if arg < 15:
return arg
return self.varint()
def varint(self):
v = self.nextbyte()
if v < 128:
return v
else:
return self.varint() * 128 + (v - 128)
def nextbytes(self, n):
start = self.advance(n)
return self.packet[start : self.index]
def nextvalues(self, n):
result = []
for i in range(n):
result.append(self.next())
return result
def peekop(self):
b = self.peekbyte()
major = b >> 6
minor = (b >> 4) & 3
arg = b & 15
return (major, minor, arg)
def nextop(self):
op = self.peekop()
self.advance()
return op
def peekend(self, arg):
return self.peekop() == (0, 3, arg)
def binarystream(self, arg, minor):
result = []
while not self.peekend(arg):
chunk = self.next()
if isinstance(chunk, bytes):
result.append(chunk)
else:
raise DecodeError('Unexpected non-binary chunk')
return self.decodebinary(minor, b''.join(result))
def valuestream(self, arg, minor, decoder):
result = []
while not self.peekend(arg):
result.append(self.next())
return decoder(minor, result)
def decodeint(self, bs):
if len(bs) == 0: return 0
acc = _ord(bs[0])
if acc & 0x80: acc = acc - 256
for b in bs[1:]:
acc = (acc << 8) | _ord(b)
return acc
def decodebinary(self, minor, bs):
if minor == 0: return self.decodeint(bs)
if minor == 1: return bs.decode('utf-8')
if minor == 2: return bs
if minor == 3: return Symbol(bs.decode('utf-8'))
def decoderecord(self, minor, vs):
if minor == 3:
if not vs: raise DecodeError('Too few elements in encoded record')
return Record(vs[0], vs[1:])
else:
return Record(self.shortForms[minor], vs)
def decodecollection(self, minor, vs):
if minor == 0: return tuple(vs)
if minor == 1: return frozenset(vs)
if minor == 2: return ImmutableDict.from_kvs(vs)
if minor == 3: raise DecodeError('Invalid collection type')
def next(self):
(major, minor, arg) = self.nextop()
if major == 0:
if minor == 0:
if arg == 0: return False
if arg == 1: return True
if arg == 2: return Float(struct.unpack('>f', self.nextbytes(4))[0])
if arg == 3: return struct.unpack('>d', self.nextbytes(8))[0]
raise DecodeError('Invalid format A encoding')
elif minor == 1:
return arg - 16 if arg > 12 else arg
elif minor == 2:
t = arg >> 2
n = arg & 3
if t == 0: raise DecodeError('Invalid format C start byte')
if t == 1: return self.binarystream(arg, n)
if t == 2: return self.valuestream(arg, n, self.decoderecord)
if t == 3: return self.valuestream(arg, n, self.decodecollection)
else: # minor == 3
raise DecodeError('Unexpected format C end byte')
elif major == 1:
return self.decodebinary(minor, self.nextbytes(self.wirelength(arg)))
elif major == 2:
return self.decoderecord(minor, self.nextvalues(self.wirelength(arg)))
else: # major == 3
return self.decodecollection(minor, self.nextvalues(self.wirelength(arg)))
class Encoder(Codec):
def __init__(self):
super(Encoder, self).__init__()
self.buffer = bytearray()
def contents(self):
return bytes(self.buffer)
def varint(self, v):
if v < 128:
self.buffer.append(v)
else:
self.buffer.append((v % 128) + 128)
self.varint(v // 128)
def leadbyte(self, major, minor, arg):
self.buffer.append(((major & 3) << 6) | ((minor & 3) << 4) | (arg & 15))
def header(self, major, minor, wirelength):
if wirelength < 15:
self.leadbyte(major, minor, wirelength)
else:
self.leadbyte(major, minor, 15)
self.varint(wirelength)
def encodeint(self, v):
bitcount = (~v if v < 0 else v).bit_length() + 1
bytecount = (bitcount + 7) // 8
self.header(1, 0, bytecount)
def enc(n,x):
if n > 0:
enc(n-1, x >> 8)
self.buffer.append(x & 255)
enc(bytecount, v)
def encodecollection(self, minor, items):
self.header(3, minor, len(items))
for i in items: self.append(i)
def append(self, v):
if hasattr(v, '__preserve_on__'):
v.__preserve_on__(self)
elif v is False:
self.leadbyte(0, 0, 0)
elif v is True:
self.leadbyte(0, 0, 1)
elif isinstance(v, float):
self.leadbyte(0, 0, 3)
self.buffer.extend(struct.pack('>d', v))
elif isinstance(v, numbers.Number):
if v >= -3 and v <= 12:
self.leadbyte(0, 1, v if v >= 0 else v + 16)
else:
self.encodeint(v)
elif isinstance(v, bytes):
self.header(1, 2, len(v))
self.buffer.extend(v)
elif isinstance(v, basestring):
bs = v.encode('utf-8')
self.header(1, 1, len(bs))
self.buffer.extend(bs)
elif isinstance(v, list):
self.encodecollection(0, v)
elif isinstance(v, tuple):
self.encodecollection(0, v)
elif isinstance(v, set):
self.encodecollection(1, v)
elif isinstance(v, frozenset):
self.encodecollection(1, v)
elif isinstance(v, dict):
self.encodecollection(2, list(dict_kvs(v)))
else:
try:
i = iter(v)
except TypeError:
raise EncodeError('Cannot encode %r' % (v,))
self.encodestream(3, 0, i)

View File

@ -1,856 +0,0 @@
#lang racket/base
;; Preserve, as in Fruit Preserve, as in a remarkably weak pun on pickling/dehydration etc
(provide (struct-out stream-of)
(struct-out record)
short-form-labels
read-preserve
string->preserve
encode
decode
wire-value)
(require racket/bytes)
(require racket/dict)
(require racket/generator)
(require racket/match)
(require racket/set)
(require bitsyntax)
(require syndicate/support/struct)
(require (only-in syntax/readerr raise-read-error))
(require imperative-syndicate/assertions)
(require imperative-syndicate/pattern)
(struct stream-of (kind generator) #:transparent)
(struct record (label fields) #:transparent)
(define short-form-labels
(make-parameter (vector 'discard 'capture 'observe)))
(define (encode v)
(bit-string->bytes (bit-string (v :: (wire-value)))))
(define (decode bs [on-fail (lambda () (error 'decode "Invalid encoding: ~v" bs))])
(bit-string-case bs
([ (v :: (wire-value)) ] v)
(else (on-fail))))
(define-syntax wire-value
(syntax-rules ()
[(_ #t input ks kf) (decode-value input ks kf)]
[(_ #f v) (encode-value v)]))
(define-syntax wire-length
(syntax-rules ()
[(_ #t input ks kf) (decode-wire-length input ks kf)]
[(_ #f v) (encode-wire-length v)]))
(define (encode-wire-length v)
(when (negative? v) (error 'encode-wire-length "Cannot encode negative wire-length ~v" v))
(if (< v #b1111)
(bit-string (v :: bits 4))
(bit-string (#b1111 :: bits 4) ((encode-varint v) :: binary))))
(define (encode-varint v)
(if (< v 128)
(bytes v)
(bit-string ((+ (modulo v 128) 128) :: bits 8)
((encode-varint (quotient v 128)) :: binary))))
(define (encode-array-like major minor fields)
(bit-string (major :: bits 2)
(minor :: bits 2)
((length fields) :: (wire-length))
((apply bit-string-append (map encode-value fields)) :: binary)))
(define (encode-binary-like major minor bs)
(bit-string (major :: bits 2)
(minor :: bits 2)
((bytes-length bs) :: (wire-length))
(bs :: binary)))
(define (encode-start-byte major minor)
(bit-string (#b0010 :: bits 4) (major :: bits 2) (minor :: bits 2)))
(define (encode-end-byte major minor)
(bit-string (#b0011 :: bits 4) (major :: bits 2) (minor :: bits 2)))
(define (encode-stream major minor chunk-ok? generator)
(bit-string-append (encode-start-byte major minor)
(let loop ()
(match (generator)
[(? void?) #""]
[(? chunk-ok? v) (bit-string-append (encode-value v) (loop))]
[bad (error 'encode-stream "Cannot encode chunk: ~v" bad)]))
(encode-end-byte major minor)))
(define (dict-keys-and-values d)
(reverse (for/fold [(acc '())] [((k v) (in-dict d))] (cons v (cons k acc)))))
(define (short-form-for-label key)
(let ((labels (short-form-labels)))
(let loop ((i 0))
(cond [(= i 3) #f]
[(equal? (vector-ref labels i) key) i]
[else (loop (+ i 1))]))))
(define (encode-record key fields)
(define short (short-form-for-label key))
(if short
(encode-array-like 2 short fields)
(encode-array-like 2 3 (cons key fields))))
(define (encode-value v)
(match v
[#f (bytes #b00000000)]
[#t (bytes #b00000001)]
[(? single-flonum?) (bit-string #b00000010 (v :: float bits 32))]
[(? double-flonum?) (bit-string #b00000011 (v :: float bits 64))]
[(? integer? x) #:when (<= -3 x 12) (bit-string (#b0001 :: bits 4) (x :: bits 4))]
[(stream-of 'string p) (encode-stream 1 1 bytes? p)]
[(stream-of 'byte-string p) (encode-stream 1 2 bytes? p)]
[(stream-of 'symbol p) (encode-stream 1 3 bytes? p)]
[(stream-of 'sequence p) (encode-stream 3 0 (lambda (x) #t) p)]
[(stream-of 'set p) (encode-stream 3 1 (lambda (x) #t) p)]
[(stream-of 'dictionary p) (encode-stream 3 2 (lambda (x) #t) p)]
;; [0 (bytes #b10000000)]
[(? integer?)
(define raw-bit-count (+ (integer-length v) 1)) ;; at least one sign bit
(define byte-count (quotient (+ raw-bit-count 7) 8))
(bit-string (#b0100 :: bits 4) (byte-count :: (wire-length)) (v :: integer bytes byte-count))]
[(? string?) (encode-binary-like 1 1 (string->bytes/utf-8 v))]
[(? bytes?) (encode-binary-like 1 2 v)]
[(? symbol?) (encode-binary-like 1 3 (string->bytes/utf-8 (symbol->string v)))]
[(record label fields) (encode-record label fields)]
[(? non-object-struct?)
(define key (prefab-struct-key v))
(when (not key) (error 'encode-value "Cannot encode non-prefab struct ~v" v))
(encode-record key (cdr (vector->list (struct->vector v))))]
[(? list?) (encode-array-like 3 0 v)]
[(? set?) (encode-array-like 3 1 (set->list v))]
[(? dict?) (encode-array-like 3 2 (dict-keys-and-values v))]
[_ (error 'encode-value "Cannot encode value ~v" v)]))
;;---------------------------------------------------------------------------
(define (decode-wire-length bs ks kf)
(bit-string-case bs
([ (= #b1111 :: bits 4) (rest :: binary) ]
(decode-varint rest
(lambda (v tail)
(if (< v #b1111)
(kf)
(ks v tail)))
kf))
([ (v :: bits 4) (rest :: binary) ] (ks v rest))
(else (kf))))
(define (decode-varint bs ks kf)
(bit-string-case bs
([ (= 1 :: bits 1) (v :: bits 7) (rest :: binary) ]
(decode-varint rest (lambda (acc tail) (ks (+ (* acc 128) v) tail)) kf))
([ (= 0 :: bits 1) (v :: bits 7) (rest :: binary) ]
(ks v rest))
(else
(kf))))
(define (decode-values n acc-rev bs ks kf)
(if (zero? n)
(ks (reverse acc-rev) bs)
(bit-string-case bs
([ (v :: (wire-value)) (rest :: binary) ]
(decode-values (- n 1) (cons v acc-rev) rest ks kf))
(else (kf)))))
(define (decode-binary minor bs rest ks kf)
(match minor
[0 (if (positive? (bit-string-length bs))
(ks (bit-string->signed-integer bs #t) rest)
(ks 0 rest))]
[2 (ks bs rest)]
[(or 1 3)
((with-handlers [(exn:fail:contract? (lambda (e) kf))]
(define s (bytes->string/utf-8 bs))
(lambda () (ks (if (= minor 3) (string->symbol s) s) rest))))]))
(define (build-record label fields)
(with-handlers [(exn:fail:contract? (lambda (e) (record label fields)))]
(apply make-prefab-struct label fields)))
(define (decode-record minor fields rest ks kf)
(match* (minor fields)
[(3 (list* key fs)) (ks (build-record key fs) rest)]
[(3 '()) (kf)]
[(n fs) (ks (build-record (vector-ref (short-form-labels) n) fs) rest)]))
(define (decode-collection minor vs rest ks kf)
(match minor
[0 (ks vs rest)]
[1 (ks (list->set vs) rest)]
[2 (if (even? (length vs))
(ks (apply hash vs) rest)
(kf))]
[_ (kf)]))
(define (decode-stream major minor chunk-ok? join-chunks decode rest ks kf)
(let loop ((acc-rev '()) (rest rest))
(bit-string-case rest
([ (= #b0011 :: bits 4) (emajor :: bits 2) (eminor :: bits 2) (rest :: binary) ]
(if (and (= major emajor) (= minor eminor))
(decode minor (join-chunks (reverse acc-rev)) rest ks kf)
(kf)))
(else
(decode-value rest
(lambda (chunk rest)
(if (chunk-ok? chunk)
(loop (cons chunk acc-rev) rest)
(kf)))
kf)))))
(define (decode-value bs ks kf)
(bit-string-case bs
([ (= #b00000000 :: bits 8) (rest :: binary) ] (ks #f rest))
([ (= #b00000001 :: bits 8) (rest :: binary) ] (ks #t rest))
([ (= #b00000010 :: bits 8) (v :: float bits 32) (rest :: binary) ] (ks (real->single-flonum v) rest))
([ (= #b00000011 :: bits 8) (v :: float bits 64) (rest :: binary) ] (ks v rest))
([ (= #b0001 :: bits 4) (x :: bits 4) (rest :: binary) ] (ks (if (> x 12) (- x 16) x) rest))
([ (= #b001001 :: bits 6) (minor :: bits 2) (rest :: binary) ]
(decode-stream 1 minor bytes? bytes-append* decode-binary rest ks kf))
([ (= #b001010 :: bits 6) (minor :: bits 2) (rest :: binary) ]
(decode-stream 2 minor (lambda (x) #t) values decode-record rest ks kf))
([ (= #b001011 :: bits 6) (minor :: bits 2) (rest :: binary) ]
(decode-stream 3 minor (lambda (x) #t) values decode-collection rest ks kf))
([ (= #b01 :: bits 2) (minor :: bits 2) (byte-count :: (wire-length))
(bits :: binary bytes byte-count)
(rest :: binary) ]
(decode-binary minor (bit-string->bytes bits) rest ks kf))
([ (= #b10 :: bits 2) (minor :: bits 2) (field-count :: (wire-length)) (rest :: binary) ]
(decode-values field-count '() rest
(lambda (fields rest) (decode-record minor fields rest ks kf))
kf))
([ (= #b11 :: bits 2) (minor :: bits 2) (count :: (wire-length)) (rest :: binary) ]
(decode-values count '() rest
(lambda (vs rest) (decode-collection minor vs rest ks kf))
kf))
(else (kf))))
;;---------------------------------------------------------------------------
(define (skip-whitespace* i)
(regexp-match? #px#"^(\\s|,)*" i)
(match (peek-char i)
[#\; (regexp-match? #px#"[^\r\n]*[\r\n]" i) (skip-whitespace* i)]
[_ #t]))
(define (parse-error* i fmt . args)
(define-values [line column pos] (port-next-location i))
(raise-read-error (format "read-preserve: ~a" (apply format fmt args))
(object-name i)
line
column
pos
#f))
(define (read-preserve [i (current-input-port)])
(local-require net/base64)
(local-require file/sha1)
(define-match-expander px
(syntax-rules ()
[(_ re pat) (app (lambda (v) (regexp-try-match re v)) pat)]))
(define (parse-error fmt . args)
(apply parse-error* i fmt args))
(define (eof-guard ch)
(match ch
[(? eof-object?) (parse-error "Unexpected end of input")]
[ch ch]))
(define (peek/no-eof) (eof-guard (peek-char i)))
(define (read/no-eof) (eof-guard (read-char i)))
(define (skip-whitespace) (skip-whitespace* i))
(define (read-sequence terminator)
(sequence-fold '() (lambda (acc) (cons (read-value) acc)) reverse terminator))
(define (read-dictionary-or-set)
(sequence-fold #f
(lambda (acc)
(define k (read-value))
(skip-whitespace)
(match (peek-char i)
[#\: (when (set? acc) (parse-error "Unexpected key/value separator in set"))
(read-char i)
(define v (read-value))
(hash-set (or acc (hash)) k v)]
[_ (when (hash? acc) (parse-error "Missing expected key/value separator"))
(set-add (or acc (set)) k)]))
(lambda (acc) (or acc (hash)))
#\}))
(define PIPE #\|)
(define (read-raw-symbol acc)
(match (peek-char i)
[(or (? eof-object?)
(? char? (or #\( #\) #\{ #\} #\[ #\]
#\" #\; #\, #\# #\: (== PIPE)
(? char-whitespace?))))
(string->symbol (list->string (reverse acc)))]
[_ (read-raw-symbol (cons (read-char i) acc))]))
(define (read-base64-binary acc)
(skip-whitespace)
(define ch (read/no-eof))
(cond [(eqv? ch #\})
(base64-decode (string->bytes/latin-1 (list->string (reverse acc))))]
[(or (and (char>=? ch #\A) (char<=? ch #\Z))
(and (char>=? ch #\a) (char<=? ch #\z))
(and (char>=? ch #\0) (char<=? ch #\9))
(memv ch '(#\+ #\/ #\- #\_ #\=)))
(read-base64-binary (cons ch acc))]
[else
(parse-error "Invalid base64 character")]))
(define (hexdigit? ch)
(or (and (char>=? ch #\A) (char<=? ch #\F))
(and (char>=? ch #\a) (char<=? ch #\f))
(and (char>=? ch #\0) (char<=? ch #\9))))
(define (read-hex-binary acc)
(skip-whitespace)
(define ch (read/no-eof))
(cond [(eqv? ch #\})
(hex-string->bytes (list->string (reverse acc)))]
[(hexdigit? ch)
(define ch2 (read/no-eof))
(when (not (hexdigit? ch2))
(parse-error "Hex-encoded binary digits must come in pairs"))
(read-hex-binary (cons ch2 (cons ch acc)))]
[else
(parse-error "Invalid hex character")]))
(define (read-stringlike xform-item finish terminator-char hexescape-char hexescape-proc)
(let loop ((acc '()))
(match (read/no-eof)
[(== terminator-char) (finish (reverse acc))]
[#\\ (match (read/no-eof)
[(== hexescape-char) (loop (cons (hexescape-proc) acc))]
[(and c (or (== terminator-char) #\\ #\/)) (loop (cons (xform-item c) acc))]
[#\b (loop (cons (xform-item #\u08) acc))]
[#\f (loop (cons (xform-item #\u0C) acc))]
[#\n (loop (cons (xform-item #\u0A) acc))]
[#\r (loop (cons (xform-item #\u0D) acc))]
[#\t (loop (cons (xform-item #\u09) acc))]
[c (parse-error "Invalid escape code \\~a" c)])]
[c (loop (cons (xform-item c) acc))])))
(define (read-string terminator-char)
(read-stringlike values
list->string
terminator-char
#\u
(lambda ()
(integer->char
(match i
[(px #px#"^[a-fA-F0-9]{4}" (list hexdigits))
(define n1 (string->number (bytes->string/utf-8 hexdigits) 16))
(if (<= #xd800 n1 #xdfff) ;; surrogate pair first half
(match i
[(px #px#"^\\\\u([a-fA-F0-9]{4})" (list _ hexdigits2))
(define n2 (string->number (bytes->string/utf-8 hexdigits2) 16))
(if (<= #xdc00 n2 #xdfff)
(+ (arithmetic-shift (- n1 #xd800) 10)
(- n2 #xdc00)
#x10000)
(parse-error "Bad second half of surrogate pair"))]
[_ (parse-error "Missing second half of surrogate pair")])
n1)]
[_ (parse-error "Bad string \\u escape")])))))
(define (read-literal-binary)
(read-stringlike (lambda (c)
(define b (char->integer c))
(when (>= b 256)
(parse-error "Invalid code point ~a (~v) in literal binary" b c))
b)
list->bytes
#\"
#\x
(lambda ()
(match i
[(px #px#"^[a-fA-F0-9]{2}" (list hexdigits))
(string->number (bytes->string/utf-8 hexdigits) 16)]
[_ (parse-error "Bad binary \\x escape")]))))
(define (read-intpart acc-rev)
(match (peek-char i)
[#\0 (read-fracexp (cons (read-char i) acc-rev))]
[_ (read-digit+ acc-rev read-fracexp)]))
(define (read-digit* acc-rev k)
(match (peek-char i)
[(? char? (? char-numeric?)) (read-digit* (cons (read-char i) acc-rev) k)]
[_ (k acc-rev)]))
(define (read-digit+ acc-rev k)
(match (peek-char i)
[(? char? (? char-numeric?)) (read-digit* (cons (read-char i) acc-rev) k)]
[_ (parse-error "Incomplete number")]))
(define (read-fracexp acc-rev)
(match (peek-char i)
[#\. (read-digit+ (cons (read-char i) acc-rev) read-exp)]
[_ (read-exp acc-rev)]))
(define (read-exp acc-rev)
(match (peek-char i)
[(or #\e #\E) (read-sign-and-exp (cons (read-char i) acc-rev))]
[_ (finish-number acc-rev)]))
(define (read-sign-and-exp acc-rev)
(match (peek-char i)
[(or #\+ #\-) (read-digit+ (cons (read-char i) acc-rev) finish-number)]
[_ (read-digit+ acc-rev finish-number)]))
(define (finish-number acc-rev)
(define s (list->string (reverse acc-rev)))
(define n (string->number s))
(when (not n) (parse-error "Invalid number: ~v" s))
(if (flonum? n)
(match (peek-char i)
[(or #\f #\F) (read-char i) (real->single-flonum n)]
[_ n])
n))
(define (read-number)
(match (peek/no-eof)
[#\- (read-intpart (list (read-char i)))]
[_ (read-intpart (list))]))
(define (sequence-fold acc accumulate-one finish terminator-char)
(let loop ((acc acc))
(skip-whitespace)
(match (peek/no-eof)
[(== terminator-char) (read-char i) (finish acc)]
[_ (loop (accumulate-one acc))])))
(define (collect-fields head)
(match (peek-char i)
[#\(
(read-char i)
(collect-fields (build-record head (read-sequence #\))))]
[_
head]))
(define (read-value)
(skip-whitespace)
(collect-fields
(match (peek-char i)
[(? eof-object? o) o]
[#\{ (read-char i) (read-dictionary-or-set)]
[#\[ (read-char i) (read-sequence #\])]
[(or #\- #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9) (read-number)]
[#\" (read-char i) (read-string #\")]
[(== PIPE) (read-char i) (string->symbol (read-string PIPE))]
[#\# (match i
[(px #px#"^#set\\{" (list _))
(sequence-fold (set) (lambda (acc) (set-add acc (read-value))) values #\})]
[(px #px#"^#hexvalue\\{" (list _))
(decode (read-hex-binary '()) (lambda () (parse-error "Invalid #hexvalue encoding")))]
[(px #px#"^#true" (list _))
#t]
[(px #px#"^#false" (list _))
#f]
[(px #px#"^#\"" (list _))
(read-literal-binary)]
[(px #px#"^#hex\\{" (list _))
(read-hex-binary '())]
[(px #px#"^#base64\\{" (list _))
(read-base64-binary '())]
[_
(parse-error "Invalid preserve value")])]
[#\: (parse-error "Unexpected key/value separator between items")]
[_ (read-raw-symbol '())])))
(read-value))
(define (string->preserve s)
(define p (open-input-string s))
(define v (read-preserve p))
(skip-whitespace* p)
(when (not (eof-object? (peek-char p)))
(parse-error* p "Unexpected text following preserve"))
v)
;;---------------------------------------------------------------------------
(module+ test
(require rackunit)
(require (for-syntax racket syntax/srcloc))
(check-equal? (bit-string->bytes (encode-varint 0)) (bytes 0))
(check-equal? (bit-string->bytes (encode-varint 1)) (bytes 1))
(check-equal? (bit-string->bytes (encode-varint 127)) (bytes 127))
(check-equal? (bit-string->bytes (encode-varint 128)) (bytes 128 1))
(check-equal? (bit-string->bytes (encode-varint 255)) (bytes 255 1))
(check-equal? (bit-string->bytes (encode-varint 256)) (bytes 128 2))
(check-equal? (bit-string->bytes (encode-varint 300)) (bytes #b10101100 #b00000010))
(check-equal? (bit-string->bytes (encode-varint 1000000000)) (bytes 128 148 235 220 3))
(define (ks* v rest) (list v (bit-string->bytes rest)))
(define (kf*) (void))
(check-equal? (decode-varint (bytes 0) ks* kf*) (list 0 (bytes)))
(check-equal? (decode-varint (bytes 0 99) ks* kf*) (list 0 (bytes 99)))
(check-equal? (decode-varint (bytes 1) ks* kf*) (list 1 (bytes)))
(check-equal? (decode-varint (bytes 127) ks* kf*) (list 127 (bytes)))
(check-equal? (decode-varint (bytes 128 1) ks* kf*) (list 128 (bytes)))
(check-equal? (decode-varint (bytes 128 1 99) ks* kf*) (list 128 (bytes 99)))
(check-equal? (decode-varint (bytes 255 1) ks* kf*) (list 255 (bytes)))
(check-equal? (decode-varint (bytes 128 2) ks* kf*) (list 256 (bytes)))
(check-equal? (decode-varint (bytes #b10101100 #b00000010) ks* kf*) (list 300 (bytes)))
(check-equal? (decode-varint (bytes 128 148 235 220 3) ks* kf*) (list 1000000000 (bytes)))
(check-equal? (decode-varint (bytes 128 148 235 220 3 99) ks* kf*) (list 1000000000 (bytes 99)))
(check-equal? (bit-string->bytes (bit-string (0 :: bits 4) (0 :: (wire-length)))) (bytes 0))
(check-equal? (bit-string->bytes (bit-string (0 :: bits 4) (3 :: (wire-length)))) (bytes 3))
(check-equal? (bit-string->bytes (bit-string (0 :: bits 4) (14 :: (wire-length)))) (bytes 14))
(check-equal? (bit-string->bytes (bit-string (0 :: bits 4) (15 :: (wire-length)))) (bytes 15 15))
(check-equal? (bit-string->bytes (bit-string (0 :: bits 4) (100 :: (wire-length))))
(bytes 15 100))
(check-equal? (bit-string->bytes (bit-string (0 :: bits 4) (300 :: (wire-length))))
(bytes 15 #b10101100 #b00000010))
(define (dwl bs)
(bit-string-case bs
([ (= 0 :: bits 4) (w :: (wire-length)) ] w)
(else (void))))
(check-equal? (dwl (bytes 0)) 0)
(check-equal? (dwl (bytes 3)) 3)
(check-equal? (dwl (bytes 14)) 14)
(check-equal? (dwl (bytes 15)) (void))
(check-equal? (dwl (bytes 15 9)) (void)) ;; not canonical
(check-equal? (dwl (bytes 15 15)) 15)
(check-equal? (dwl (bytes 15 100)) 100)
(check-equal? (dwl (bytes 15 #b10101100 #b00000010)) 300)
(struct speak (who what) #:prefab)
(define (expected . pieces)
(bit-string->bytes
(apply bit-string-append
(map (match-lambda
[(? byte? b) (bytes b)]
[(? bytes? bs) bs]
[(? string? s) (string->bytes/utf-8 s)])
pieces))))
(define (d bs) (decode bs void))
(define-syntax (cross-check stx)
(syntax-case stx ()
((_ text v (b ...))
#'(let ((val v)) (cross-check text v v (b ...))))
((_ text forward back (b ...))
#`(let ((loc #,(source-location->string #'forward)))
(check-equal? (string->preserve text) back loc)
(check-equal? (d (encode forward)) back loc)
(check-equal? (d (encode back)) back loc)
(check-equal? (d (expected b ...)) back loc)
(check-equal? (encode forward) (expected b ...) loc)
))))
(define-syntax (cross-check/nondeterministic stx)
(syntax-case stx ()
((_ text v (b ...))
#'(let ((val v)) (cross-check/nondeterministic text v v (b ...))))
((_ text forward back (b ...))
#`(let ((loc #,(source-location->string #'forward)))
(check-equal? (string->preserve text) back loc)
(check-equal? (d (encode forward)) back loc)
(check-equal? (d (encode back)) back loc)
(check-equal? (d (expected b ...)) back loc)
))))
(cross-check "capture(discard())" (capture (discard)) (#x91 #x80))
(cross-check "observe(speak(discard(), capture(discard())))"
(observe (speak (discard) (capture (discard))))
(#xA1 #xB3 #x75 "speak" #x80 #x91 #x80))
(cross-check "[1, 2, 3, 4]" '(1 2 3 4) (#xC4 #x11 #x12 #x13 #x14))
(cross-check "[1 2 3 4]"
(stream-of 'sequence (sequence->generator '(1 2 3 4)))
'(1 2 3 4)
(#x2C #x11 #x12 #x13 #x14 #x3C))
(cross-check " [ -2 -1 0 1 ] " '(-2 -1 0 1) (#xC4 #x1E #x1F #x10 #x11))
(cross-check "\"hello\"" "hello" (#x55 "hello"))
(cross-check "\"hello\""
(stream-of 'string (sequence->generator '(#"he" #"llo")))
"hello"
(#x25 #x62 "he" #x63 "llo" #x35))
(cross-check "\"hello\""
(stream-of 'string (sequence->generator '(#"he" #"ll" #"" #"" #"o")))
"hello"
(#x25 #x62 "he" #x62 "ll" #x60 #x60 #x61 "o" #x35))
(cross-check "#\"hello\""
(stream-of 'byte-string (sequence->generator '(#"he" #"ll" #"" #"" #"o")))
#"hello"
(#x26 #x62 "he" #x62 "ll" #x60 #x60 #x61 "o" #x36))
(cross-check "hello"
(stream-of 'symbol (sequence->generator '(#"he" #"ll" #"" #"" #"o")))
'hello
(#x27 #x62 "he" #x62 "ll" #x60 #x60 #x61 "o" #x37))
(cross-check "[\"hello\" there #\"world\" [] #set{} #true #false]"
`("hello" there #"world" () ,(set) #t #f)
(#xC7 #x55 "hello" #x75 "there" #x65 "world" #xC0 #xD0 #x01 #x00))
(cross-check "#\"ABC\"" #"ABC" (#x63 #x41 #x42 #x43))
(cross-check "#hex{414243}" #"ABC" (#x63 #x41 #x42 #x43))
(cross-check "#hex{ 41 4A 4e }" #"AJN" (#x63 #x41 #x4A #x4E))
(cross-check "#hex{ 41;re\n 42 43 }" #"ABC" (#x63 #x41 #x42 #x43))
(check-exn exn? (lambda () (string->preserve "#hex{414 243}"))) ;; bytes must be 2-digits entire
(cross-check "#base64{Y29yeW1i}" #"corymb" (#x66 "corymb"))
(cross-check "#base64{Y29 yeW 1i}" #"corymb" (#x66 "corymb"))
(cross-check ";; a comment\n#base64{\n;x\nY29 yeW 1i}" #"corymb" (#x66 "corymb"))
(cross-check "#base64{SGk=}" #"Hi" (#x62 "Hi"))
(cross-check "#base64{SGk}" #"Hi" (#x62 "Hi"))
(cross-check "#base64{ S G k }" #"Hi" (#x62 "Hi"))
(check-equal? (string->preserve "[]") '())
(check-equal? (string->preserve "{}") (hash))
(check-equal? (string->preserve "\"\"") "")
(check-equal? (string->preserve "||") (string->symbol ""))
(check-equal? (string->preserve "#set{}") (set))
(check-equal? (string->preserve "{1 2 3}") (set 1 2 3))
(check-equal? (string->preserve "#set{1 2 3}") (set 1 2 3))
(cross-check "\"abc\\u6c34\\u6C34\\\\\\/\\\"\\b\\f\\n\\r\\txyz\""
"abc\u6c34\u6c34\\/\"\b\f\n\r\txyz"
(#x5f #x14
#x61 #x62 #x63 #xe6 #xb0 #xb4 #xe6 #xb0
#xb4 #x5c #x2f #x22 #x08 #x0c #x0a #x0d
#x09 #x78 #x79 #x7a))
(cross-check "|abc\\u6c34\\u6C34\\\\\\/\\|\\b\\f\\n\\r\\txyz|"
(string->symbol "abc\u6c34\u6c34\\/|\b\f\n\r\txyz")
(#x7f #x14
#x61 #x62 #x63 #xe6 #xb0 #xb4 #xe6 #xb0
#xb4 #x5c #x2f #x7c #x08 #x0c #x0a #x0d
#x09 #x78 #x79 #x7a))
(check-exn #px"Invalid escape code \\\\u" (lambda () (string->preserve "#\"\\u6c34\"")))
(cross-check "#\"abc\\x6c\\x34\\xf0\\\\\\/\\\"\\b\\f\\n\\r\\txyz\""
#"abc\x6c\x34\xf0\\/\"\b\f\n\r\txyz"
(#x6f #x11
#x61 #x62 #x63 #x6c #x34 #xf0 #x5c #x2f
#x22 #x08 #x0c #x0a #x0d #x09 #x78 #x79 #x7a))
(cross-check "\"\\uD834\\uDD1E\"" "\U0001D11E" (#x54 #xF0 #x9D #x84 #x9E))
(cross-check "-257" -257 (#x42 #xFE #xFF))
(cross-check "-256" -256 (#x42 #xFF #x00))
(cross-check "-255" -255 (#x42 #xFF #x01))
(cross-check "-254" -254 (#x42 #xFF #x02))
(cross-check "-129" -129 (#x42 #xFF #x7F))
(cross-check "-128" -128 (#x41 #x80))
(cross-check "-127" -127 (#x41 #x81))
(cross-check "-4" -4 (#x41 #xFC))
(cross-check "-3" -3 (#x1D))
(cross-check "-2" -2 (#x1E))
(cross-check "-1" -1 (#x1F))
(cross-check "0" 0 (#x10))
(cross-check "1" 1 (#x11))
(cross-check "12" 12 (#x1C))
(cross-check "13" 13 (#x41 #x0D))
(cross-check "127" 127 (#x41 #x7F))
(cross-check "128" 128 (#x42 #x00 #x80))
(cross-check "255" 255 (#x42 #x00 #xFF))
(cross-check "256" 256 (#x42 #x01 #x00))
(cross-check "32767" 32767 (#x42 #x7F #xFF))
(cross-check "32768" 32768 (#x43 #x00 #x80 #x00))
(cross-check "65535" 65535 (#x43 #x00 #xFF #xFF))
(cross-check "65536" 65536 (#x43 #x01 #x00 #x00))
(cross-check "131072" 131072 (#x43 #x02 #x00 #x00))
(cross-check "1.0f" 1.0f0 (#b00000010 #b00111111 #b10000000 0 0))
(cross-check "1.0" 1.0 (#b00000011 #b00111111 #b11110000 0 0 0 0 0 0))
(cross-check "-1.202e300" -1.202e300 (#x03 #xFE #x3C #xB7 #xB7 #x59 #xBF #x04 #x26))
(check-equal? (d (expected #x25 #x51 "a" #x35)) (void)) ;; Bad chunk type: must be bytes
(check-equal? (d (expected #x25 #x71 "a" #x35)) (void)) ;; Bad chunk type: must be bytes
(check-equal? (d (expected #x26 #x51 "a" #x36)) (void)) ;; Bad chunk type: must be bytes
(check-equal? (d (expected #x26 #x71 "a" #x36)) (void)) ;; Bad chunk type: must be bytes
(check-equal? (d (expected #x27 #x51 "a" #x37)) (void)) ;; Bad chunk type: must be bytes
(check-equal? (d (expected #x27 #x71 "a" #x37)) (void)) ;; Bad chunk type: must be bytes
(check-equal? (d (expected #x25 #x61 "a" #x35)) "a")
(check-equal? (d (expected #x26 #x61 "a" #x36)) #"a")
(check-equal? (d (expected #x27 #x61 "a" #x37)) 'a)
(struct date (year month day) #:prefab)
(struct thing (id) #:prefab)
(struct person thing (name date-of-birth) #:prefab)
(struct titled person (title) #:prefab)
(cross-check
"[titled person 2 thing 1](101, \"Blackwell\", date(1821 2 3), \"Dr\")"
(titled 101 "Blackwell" (date 1821 2 3) "Dr")
(#xB5 ;; Record, generic, 4+1
#xC5 ;; Sequence, 5
#x76 #x74 #x69 #x74 #x6C #x65 #x64 ;; Symbol, "titled"
#x76 #x70 #x65 #x72 #x73 #x6F #x6E ;; Symbol, "person"
#x12 ;; SignedInteger, "2"
#x75 #x74 #x68 #x69 #x6E #x67 ;; Symbol, "thing"
#x11 ;; SignedInteger, "1"
#x41 #x65 ;; SignedInteger, "101"
#x59 #x42 #x6C #x61 #x63 #x6B #x77 #x65 #x6C #x6C ;; String, "Blackwell"
#xB4 ;; Record, generic, 3+1
#x74 #x64 #x61 #x74 #x65 ;; Symbol, "date"
#x42 #x07 #x1D ;; SignedInteger, "1821"
#x12 ;; SignedInteger, "2"
#x13 ;; SignedInteger, "3"
#x52 #x44 #x72 ;; String, "Dr"
))
(cross-check "discard()" (record 'discard '()) (discard) (#x80))
(cross-check "discard(surprise)"
(record 'discard '(surprise))
'#s(discard surprise)
(#x81 #x78 "surprise"))
(cross-check "capture(x)" (record 'capture '(x)) (capture 'x) (#x91 #x71 "x"))
(cross-check "observe(x)" (record 'observe '(x)) (observe 'x) (#xA1 #x71 "x"))
(cross-check "observe(x y)" (record 'observe '(x y)) '#s(observe x y) (#xA2 #x71 "x" #x71 "y"))
(cross-check "other(x y)"
(record 'other '(x y))
'#s(other x y)
(#xB3 #x75 "other" #x71 "x" #x71 "y"))
(cross-check "\"aString\"(3 4)"
(record "aString" '(3 4))
(#xB3 #x57 "aString" #x13 #x14))
(cross-check "discard()(3, 4)"
(record (discard) '(3 4))
(#xB3 #x80 #x13 #x14))
(check-equal? (d (expected #x2C #x00 #x00)) (void)) ;; missing end byte
(check-equal? (d (expected #xC3 #x00 #x00)) (void)) ;; missing element
(cross-check/nondeterministic
"{a: 1, \"b\": #true, [1 2 3]: #\"c\", {first-name:\"Elizabeth\"}:{surname:\"Blackwell\"}}"
(hash 'a 1
"b" #t
'(1 2 3) #"c"
(hash 'first-name "Elizabeth") (hash 'surname "Blackwell"))
(#xE8 #x71 "a" #x11
#x51 "b" #x01
#xC3 #x11 #x12 #x13 #x61 "c"
#xE2 #x7A "first-name" #x59 "Elizabeth"
#xE2 #x77 "surname" #x59 "Blackwell"
))
(let ()
(local-require json)
(define rfc8259-example1 (string->preserve #<<EOF
{
"Image": {
"Width": 800,
"Height": 600,
"Title": "View from 15th Floor",
"Thumbnail": {
"Url": "http://www.example.com/image/481989943",
"Height": 125,
"Width": 100
},
"Animated" : false,
"IDs": [116, 943, 234, 38793]
}
}
EOF
))
(define rfc8259-example2 (string->preserve #<<EOF
[
{
"precision": "zip",
"Latitude": 37.7668,
"Longitude": -122.3959,
"Address": "",
"City": "SAN FRANCISCO",
"State": "CA",
"Zip": "94107",
"Country": "US"
},
{
"precision": "zip",
"Latitude": 37.371991,
"Longitude": -122.026020,
"Address": "",
"City": "SUNNYVALE",
"State": "CA",
"Zip": "94085",
"Country": "US"
}
]
EOF
))
(cross-check/nondeterministic
"{\"Image\": {\"Width\": 800,\"Height\": 600,\"Title\": \"View from 15th Floor\",\"Thumbnail\": {\"Url\": \"http://www.example.com/image/481989943\",\"Height\": 125,\"Width\": 100},\"Animated\" : false,\"IDs\": [116, 943, 234, 38793]}}"
rfc8259-example1
(#xe2
#x55 "Image"
#xec
#x55 "Width" #x42 #x03 #x20
#x55 "Title" #x5f #x14 "View from 15th Floor"
#x58 "Animated" #x75 "false"
#x56 "Height" #x42 #x02 #x58
#x59 "Thumbnail"
#xe6
#x55 "Width" #x41 #x64
#x53 "Url" #x5f #x26 "http://www.example.com/image/481989943"
#x56 "Height" #x41 #x7d
#x53 "IDs" #xc4
#x41 #x74
#x42 #x03 #xaf
#x42 #x00 #xea
#x43 #x00 #x97 #x89
))
(cross-check/nondeterministic
"[{\"precision\": \"zip\",\"Latitude\": 37.7668,\"Longitude\": -122.3959,\"Address\": \"\",\"City\": \"SAN FRANCISCO\",\"State\": \"CA\",\"Zip\": \"94107\",\"Country\": \"US\"},{\"precision\": \"zip\",\"Latitude\": 37.371991,\"Longitude\": -122.026020,\"Address\": \"\",\"City\": \"SUNNYVALE\",\"State\": \"CA\",\"Zip\": \"94085\",\"Country\": \"US\"}]"
rfc8259-example2
(#xc2
#xef #x10
#x59 "precision" #x53 "zip"
#x58 "Latitude" #x03 #x40 #x42 #xe2 #x26 #x80 #x9d #x49 #x52
#x59 "Longitude" #x03 #xc0 #x5e #x99 #x56 #x6c #xf4 #x1f #x21
#x57 "Address" #x50
#x54 "City" #x5D "SAN FRANCISCO"
#x55 "State" #x52 "CA"
#x53 "Zip" #x55 "94107"
#x57 "Country" #x52 "US"
#xef #x10
#x59 "precision" #x53 "zip"
#x58 "Latitude" #x03 #x40 #x42 #xaf #x9d #x66 #xad #xb4 #x03
#x59 "Longitude" #x03 #xc0 #x5e #x81 #xaa #x4f #xca #x42 #xaf
#x57 "Address" #x50
#x54 "City" #x59 "SUNNYVALE"
#x55 "State" #x52 "CA"
#x53 "Zip" #x55 "94085"
#x57 "Country" #x52 "US"
))
)
)

View File

@ -1,171 +0,0 @@
from preserve import *
import unittest
if isinstance(chr(123), bytes):
def _byte(x):
return chr(x)
def _hex(x):
return x.encode('hex')
else:
def _byte(x):
return bytes([x])
def _hex(x):
return x.hex()
def _buf(*args):
result = []
for chunk in args:
if isinstance(chunk, bytes):
result.append(chunk)
elif isinstance(chunk, basestring):
result.append(chunk.encode('utf-8'))
elif isinstance(chunk, numbers.Number):
result.append(_byte(chunk))
else:
raise Exception('Invalid chunk in _buf %r' % (chunk,))
result = b''.join(result)
return result
def _varint(v):
e = Encoder()
e.varint(v)
return e.contents()
def _d(bs):
d = Decoder(bs)
return d.next()
def _e(v):
e = Encoder()
e.append(v)
return e.contents()
def _R(k, *args):
return Record(Symbol(k), args)
class CodecTests(unittest.TestCase):
def _roundtrip(self, forward, expected, back=None, nondeterministic=False):
if back is None: back = forward
self.assertEqual(_d(_e(forward)), back)
self.assertEqual(_d(_e(back)), back)
self.assertEqual(_d(expected), back)
if not nondeterministic:
actual = _e(forward)
self.assertEqual(actual, expected, '%s != %s' % (_hex(actual), _hex(expected)))
def test_decode_varint(self):
with self.assertRaises(DecodeError):
Decoder(_buf()).varint()
self.assertEqual(Decoder(_buf(0)).varint(), 0)
self.assertEqual(Decoder(_buf(10)).varint(), 10)
self.assertEqual(Decoder(_buf(100)).varint(), 100)
self.assertEqual(Decoder(_buf(200, 1)).varint(), 200)
self.assertEqual(Decoder(_buf(0b10101100, 0b00000010)).varint(), 300)
self.assertEqual(Decoder(_buf(128, 148, 235, 220, 3)).varint(), 1000000000)
def test_encode_varint(self):
self.assertEqual(_varint(0), _buf(0))
self.assertEqual(_varint(10), _buf(10))
self.assertEqual(_varint(100), _buf(100))
self.assertEqual(_varint(200), _buf(200, 1))
self.assertEqual(_varint(300), _buf(0b10101100, 0b00000010))
self.assertEqual(_varint(1000000000), _buf(128, 148, 235, 220, 3))
def test_shorts(self):
self._roundtrip(_R('capture', _R('discard')), _buf(0x91, 0x80))
self._roundtrip(_R('observe', _R('speak', _R('discard'), _R('capture', _R('discard')))),
_buf(0xA1, 0xB3, 0x75, "speak", 0x80, 0x91, 0x80))
def test_simple_seq(self):
self._roundtrip([1,2,3,4], _buf(0xC4, 0x11, 0x12, 0x13, 0x14), back=(1,2,3,4))
self._roundtrip(SequenceStream([1,2,3,4]), _buf(0x2C, 0x11, 0x12, 0x13, 0x14, 0x3C),
back=(1,2,3,4))
self._roundtrip((-2,-1,0,1), _buf(0xC4, 0x1E, 0x1F, 0x10, 0x11))
def test_str(self):
self._roundtrip(u'hello', _buf(0x55, 'hello'))
self._roundtrip(StringStream([b'he', b'llo']), _buf(0x25, 0x62, 'he', 0x63, 'llo', 0x35),
back=u'hello')
self._roundtrip(StringStream([b'he', b'll', b'', b'', b'o']),
_buf(0x25, 0x62, 'he', 0x62, 'll', 0x60, 0x60, 0x61, 'o', 0x35),
back=u'hello')
self._roundtrip(BinaryStream([b'he', b'll', b'', b'', b'o']),
_buf(0x26, 0x62, 'he', 0x62, 'll', 0x60, 0x60, 0x61, 'o', 0x36),
back=b'hello')
self._roundtrip(SymbolStream([b'he', b'll', b'', b'', b'o']),
_buf(0x27, 0x62, 'he', 0x62, 'll', 0x60, 0x60, 0x61, 'o', 0x37),
back=Symbol(u'hello'))
def test_mixed1(self):
self._roundtrip((u'hello', Symbol(u'there'), b'world', (), set(), True, False),
_buf(0xc7, 0x55, 'hello', 0x75, 'there', 0x65, 'world', 0xc0, 0xd0, 1, 0))
def test_signedinteger(self):
self._roundtrip(-257, _buf(0x42, 0xFE, 0xFF))
self._roundtrip(-256, _buf(0x42, 0xFF, 0x00))
self._roundtrip(-255, _buf(0x42, 0xFF, 0x01))
self._roundtrip(-254, _buf(0x42, 0xFF, 0x02))
self._roundtrip(-129, _buf(0x42, 0xFF, 0x7F))
self._roundtrip(-128, _buf(0x41, 0x80))
self._roundtrip(-127, _buf(0x41, 0x81))
self._roundtrip(-4, _buf(0x41, 0xFC))
self._roundtrip(-3, _buf(0x1D))
self._roundtrip(-2, _buf(0x1E))
self._roundtrip(-1, _buf(0x1F))
self._roundtrip(0, _buf(0x10))
self._roundtrip(1, _buf(0x11))
self._roundtrip(12, _buf(0x1C))
self._roundtrip(13, _buf(0x41, 0x0D))
self._roundtrip(127, _buf(0x41, 0x7F))
self._roundtrip(128, _buf(0x42, 0x00, 0x80))
self._roundtrip(255, _buf(0x42, 0x00, 0xFF))
self._roundtrip(256, _buf(0x42, 0x01, 0x00))
self._roundtrip(32767, _buf(0x42, 0x7F, 0xFF))
self._roundtrip(32768, _buf(0x43, 0x00, 0x80, 0x00))
self._roundtrip(65535, _buf(0x43, 0x00, 0xFF, 0xFF))
self._roundtrip(65536, _buf(0x43, 0x01, 0x00, 0x00))
self._roundtrip(131072, _buf(0x43, 0x02, 0x00, 0x00))
def test_floats(self):
self._roundtrip(Float(1.0), _buf(2, 0x3f, 0x80, 0, 0))
self._roundtrip(1.0, _buf(3, 0x3f, 0xf0, 0, 0, 0, 0, 0, 0))
self._roundtrip(-1.202e300, _buf(3, 0xfe, 0x3c, 0xb7, 0xb7, 0x59, 0xbf, 0x04, 0x26))
def test_badchunks(self):
self.assertEqual(_d(_buf(0x25, 0x61, 'a', 0x35)), u'a')
self.assertEqual(_d(_buf(0x26, 0x61, 'a', 0x36)), b'a')
self.assertEqual(_d(_buf(0x27, 0x61, 'a', 0x37)), Symbol(u'a'))
for a in [0x25, 0x26, 0x27]:
for b in [0x51, 0x71]:
with self.assertRaises(DecodeError, msg='Unexpected non-binary chunk') as cm:
_d(_buf(a, b, 'a', 0x10+a))
def test_person(self):
self._roundtrip(Record((Symbol(u'titled'), Symbol(u'person'), 2, Symbol(u'thing'), 1),
[
101,
u'Blackwell',
_R(u'date', 1821, 2, 3),
u'Dr'
]),
_buf(0xB5, 0xC5, 0x76, 0x74, 0x69, 0x74, 0x6C, 0x65,
0x64, 0x76, 0x70, 0x65, 0x72, 0x73, 0x6F, 0x6E,
0x12, 0x75, 0x74, 0x68, 0x69, 0x6E, 0x67, 0x11,
0x41, 0x65, 0x59, 0x42, 0x6C, 0x61, 0x63, 0x6B,
0x77, 0x65, 0x6C, 0x6C, 0xB4, 0x74, 0x64, 0x61,
0x74, 0x65, 0x42, 0x07, 0x1D, 0x12, 0x13, 0x52,
0x44, 0x72))
def test_dict(self):
self._roundtrip({ Symbol(u'a'): 1,
u'b': True,
(1, 2, 3): b'c',
ImmutableDict({ Symbol(u'first-name'): u'Elizabeth', }):
{ Symbol(u'surname'): u'Blackwell' } },
_buf(0xE8,
0x71, "a", 0x11,
0x51, "b", 0x01,
0xC3, 0x11, 0x12, 0x13, 0x61, "c",
0xE2, 0x7A, "first-name", 0x59, "Elizabeth",
0xE2, 0x77, "surname", 0x59, "Blackwell"),
nondeterministic = True)

View File

@ -8,7 +8,7 @@
(require racket/random file/sha1)
(require imperative-syndicate/skeleton)
(require imperative-syndicate/term)
(require "preserve.rkt")
(require preserves)
(define-logger mcds)