Split out preserves into its own repository
This commit is contained in:
parent
2c03883b00
commit
d497d9e6d1
|
@ -1,2 +0,0 @@
|
|||
preserve.pdf: preserve.md preserve.css
|
||||
google-chrome --headless --disable-gpu --print-to-pdf=$@ http://localhost:4000/preserve.html
|
|
@ -1,662 +0,0 @@
|
|||
---
|
||||
---
|
||||
<style>
|
||||
body { font-size: 120%; margin-left: 2rem; }
|
||||
h1, h2, h3, h4, h5, h6 { margin-left: -1rem; }
|
||||
h2 { border-bottom: solid black 1px; }
|
||||
</style>
|
||||
|
||||
2018-06-04 20:31:48 TODO: cwebber's email comments
|
||||
2018-06-04 20:31:51 TODO: look at https://github.com/imbal/rson and at clojure EDN
|
||||
2018-06-05 10:32:02 ... and at http://json-schema.org/latest/json-schema-core.html#rfc.section.4.2
|
||||
|
||||
# SPKI CAT: SPKI S-Expressions with Canonical Atom Tags
|
||||
|
||||
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
||||
Christopher Lemmer Webber <cwebber@dustycloud.org>
|
||||
May 2018
|
||||
Version 0.0.1
|
||||
|
||||
________________
|
||||
/ \
|
||||
/\__/\ / boo! \
|
||||
/ \ \ i'm very spki! \
|
||||
\=\_^__^_/= ___/\__________________/
|
||||
|/ \
|
||||
\\ | | /
|
||||
<_|--|_>
|
||||
|
||||
## Introduction
|
||||
|
||||
This document proposes a language-neutral JSON-like *data type*, along
|
||||
with a robust equivalence relation ("semantics") and a total ordering
|
||||
over inhabitants of the type.[^tjson]
|
||||
|
||||
[^tjson]: [TJSON](https://www.tjson.org/) has a similar aim:
|
||||
“different on-the-wire representations of an object correspond to
|
||||
the same typed data object”
|
||||
([source](https://news.ycombinator.com/item?id=12860143)); “TJSON
|
||||
is defined as a serialization format on top of a JSON-like data
|
||||
model” ([source](https://news.ycombinator.com/item?id=12860401)).
|
||||
|
||||
It then suggests conventions for encoding common data formats in terms
|
||||
of the proposed data type.
|
||||
|
||||
Finally, it proposes concrete *syntax* for the data type, offering a
|
||||
language-neutral transfer syntax (based on
|
||||
[Rivest's S-Expressions][sexp.txt] as used in [SPKI/SDSI][spki]) and
|
||||
suggesting possible language-specific representations for the data
|
||||
type's inhabitants.
|
||||
|
||||
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
|
||||
[spki]: http://world.std.com/~cme/html/spki.html
|
||||
|
||||
### Why not Just Use JSON?
|
||||
|
||||
<!-- JSON lacks semantics: JSON syntax doesn't denote anything -->
|
||||
|
||||
JSON offers *syntax* for numbers, strings, booleans, null, arrays and
|
||||
string-keyed maps. However, it offers no *semantics* for the syntax:
|
||||
it is left to each implementation to determine how to treat each JSON
|
||||
term. This causes
|
||||
[interoperability](http://seriot.ch/parsing_json.php) and even
|
||||
[security](http://seriot.ch/parsing_json.php) issues.
|
||||
|
||||
Specifically, JSON does not:
|
||||
|
||||
- assign any meaning to numbers,[^meaning-ieee-double]
|
||||
- determine how strings are to be compared,[^string-key-comparison]
|
||||
- determine whether object key ordering is significant, or
|
||||
- determine whether duplicate object keys are permitted, what it
|
||||
would mean if they were, or how to determine a duplicate in the
|
||||
first place.
|
||||
|
||||
In short, JSON syntax doesn't *denote* anything.[^xml-infoset] [^other-formats]
|
||||
|
||||
[^meaning-ieee-double]:
|
||||
[Section 6 of RFC 7159](https://tools.ietf.org/html/rfc7159#section-6)
|
||||
does go so far as to indicate “good interoperability can be
|
||||
achieved” by imagining that parsers are able reliably to
|
||||
understand the syntax of numbers as denoting an IEEE 754
|
||||
double-precision floating-point value.
|
||||
|
||||
[^string-key-comparison]:
|
||||
[Section 8.3 of RFC 7159](https://tools.ietf.org/html/rfc7159#section-8.3)
|
||||
suggests that *if* an implementation compares strings used as
|
||||
object keys “code unit by code unit”, then it will interoperate
|
||||
with *other such implementations*, but neither requires this
|
||||
behaviour nor discusses comparisons of strings used in other
|
||||
contexts.
|
||||
|
||||
[^xml-infoset]: The XML world has the concept of
|
||||
[XML infoset](https://www.w3.org/TR/xml-infoset/). Loosely
|
||||
speaking, XML infoset is the *denotation* of an XML document; the
|
||||
*meaning* of the document.
|
||||
|
||||
[^other-formats]: Most other recent data languages are like JSON in
|
||||
specifying only a syntax with no associated semantics. While some
|
||||
do make a sketch of a semantics, the result is often
|
||||
underspecified (e.g. in terms of how strings are to be compared),
|
||||
overly machine-oriented (e.g. treating 32-bit integers as
|
||||
fundamentally distinct from 64-bit integers and from
|
||||
floating-point numbers), overly fine (e.g. giving visibility to
|
||||
the order in which map entries are written), or all three.
|
||||
|
||||
Some examples:
|
||||
|
||||
- are the JSON values `1`, `1.0`, and `1e0` the same or different?
|
||||
- are the JSON values `1.0` and `1.0000000000000001` the same or different?
|
||||
- are the JSON strings `"päron"` (UTF-8 `70c3a4726f6e`) and `"päron"`
|
||||
(UTF-8 `7061cc88726f6e`) the same or different?
|
||||
- are the JSON objects `{"a":1, "b":2}` and `{"b":2, "a":1}` the same
|
||||
or different?
|
||||
- which, if any, of `{"a":1, "a":2}`, `{"a":1}` and `{"a":2}` are the
|
||||
same? Are all three legal?
|
||||
- are `{"päron":1}` and `{"päron":1}` the same or different?
|
||||
|
||||
Different JSON implementations give different answers to these
|
||||
questions. The JSON specifications are silent on these questions.
|
||||
|
||||
There are other minor problems with JSON having to do with its syntax.
|
||||
Examples include its relative verbosity and its lack of support for
|
||||
binary data.
|
||||
|
||||
## Starting with Semantics
|
||||
|
||||
Taking inspiration from functional programming, we start with a
|
||||
definition of the *values* that we want to work with and give them
|
||||
meaning independent of their syntax. We will treat syntax separately,
|
||||
later in this document.
|
||||
|
||||
We will want our data type to accommodate *atoms* (numbers and text),
|
||||
*products* (both tuples and sequences), and *labelled
|
||||
sums*.[^zephyr-asdl] It should also include *keyed maps*. We should
|
||||
avoid unnecessary restrictions such as machine-oriented fixed-width
|
||||
integer or floating-point values where possible.
|
||||
|
||||
[^zephyr-asdl]: This design was loosely inspired by Zephyr ASDL (h/t
|
||||
[Darius Bacon](https://twitter.com/abecedarius/status/993545767884226561)),
|
||||
which doesn't offer much in the way of atoms, but offers
|
||||
general-purpose labelled sums and products. See D. C. Wang, A. W.
|
||||
Appel, J. L. Korn, and C. S. Serra, “The Zephyr Abstract Syntax
|
||||
Description Language,” in USENIX Conference on Domain-Specific
|
||||
Languages, 1997, pp. 213–228.
|
||||
[PDF available.](https://www.usenix.org/legacy/publications/library/proceedings/dsl97/full_papers/wang/wang.pdf)
|
||||
|
||||
### Values
|
||||
|
||||
A `Value` is one of:
|
||||
|
||||
- a `ByteString` for general-purpose non-numeric atomic data,[^byte-string-rationale]
|
||||
- a `Number` for integers and rational numbers,
|
||||
- a `List` for general-purpose variable-length sequences,
|
||||
- a `Map` for general-purpose variable-size key/value maps, or
|
||||
- a `Record` for tagging a tuple of values with an intended interpretation.
|
||||
|
||||
We define a total order over `Value`s: Every `ByteString` is less than
|
||||
the other kinds of `Value`; every `Number` is less than any `List`,
|
||||
`Map` or `Record`, but greater than any `ByteString`; and so on.
|
||||
|
||||
That is, `ByteString < Number < List < Map < Record`.
|
||||
|
||||
Two values of the same kind are compared using kind-specific rules,
|
||||
given below.
|
||||
|
||||
Two `Value`s are equal if neither is less than the other according to
|
||||
the total order.
|
||||
|
||||
[^byte-string-rationale]: Why include `ByteString`, when we could
|
||||
instead use a reserved `Record` along with a `List` of `Number`s?
|
||||
((TODO: Actually decide about this! Similarly, why include `Map`
|
||||
rather than a restricted form of `List` with a `Record`? I think
|
||||
the answer has to do with the arbitrariness of the label we'd
|
||||
pick: unless *extremely* carefully chosen (i.e. number 0 (ideally
|
||||
even `-Inf`!) for byte strings, number 1 for map, and have the
|
||||
order go `Number < Record < List`), they would mess up the
|
||||
prettiness of the ordering. Though... we could ultimately reduce
|
||||
this to `Number` and `Record`, and have a family of `#"list"` and
|
||||
`#"map"` `Record`s...))
|
||||
|
||||
### Byte strings
|
||||
|
||||
A `ByteString` is an ordered sequence of zero or more integers in the
|
||||
inclusive range [0..255].
|
||||
|
||||
`ByteString`s are compared lexicographically.
|
||||
|
||||
We will write examples of `ByteString`s that contain only ASCII
|
||||
characters using “`#"`” as an opening quote mark and “`"`” as a
|
||||
closing quote mark.
|
||||
|
||||
**Examples.** The `ByteString` containing the three ASCII characters
|
||||
`A`, `B` and `C` is written as `#"ABC"`. The empty `ByteString` is
|
||||
written as `#""`. **N.B.** Despite appearances, these are *binary*
|
||||
data.
|
||||
|
||||
### Numbers
|
||||
|
||||
A `Number` is a signed rational number of finite precision whose
|
||||
magnitude can be exactly represented in base two with a finite number
|
||||
of digits. This includes integers of arbitrary width as well as (for
|
||||
example) the non-infinite non-NaN IEEE 754 floating-point values.
|
||||
|
||||
`Number`s are compared as mathematical numbers.
|
||||
|
||||
We will write examples of `Number`s using standard mathematical
|
||||
notation.
|
||||
|
||||
**Examples.** 10, -6, 0.5, -3/2, 33/192, -1.202E4567.
|
||||
|
||||
**Non-examples.** NaN (the clue is in the name!), ∞ (not finite),
|
||||
0.2 (cannot be exactly represented with a finite number of binary
|
||||
digits), 1/7 (likewise), 2+*i*3 (not rational), √2 (likewise).
|
||||
|
||||
### Lists
|
||||
|
||||
A `List` is an ordered sequence of zero or more `Value`s.
|
||||
|
||||
`List`s are compared lexicographically, appealing to the ordering on
|
||||
`Value`s for comparisons at each position in the `List`s.
|
||||
|
||||
### Maps
|
||||
|
||||
A `Map` is an *unordered* collection of zero or more pairs of
|
||||
`Value`s. Each pair comprises a *key* and a *value*. Keys in a `Map`
|
||||
must be pairwise distinct.
|
||||
|
||||
Instances of `Map` are compared by lexicographic comparison of the
|
||||
sequences resulting from ordering each `Map`'s pairs in ascending
|
||||
order by key. ((TODO: Is this a good idea? Is it clearly-enough
|
||||
written? An alternative approach is to compare first by the *count* of
|
||||
pairs, and only if the count is the same, start comparing the pairs
|
||||
themselves.))
|
||||
|
||||
### Records
|
||||
|
||||
A `Record` is a tuple of one or more `Value`s. The first in the tuple
|
||||
is called the *label* of the `Record`, and the other elements of the
|
||||
tuple are called its *fields*.
|
||||
|
||||
`Record` labels are *usually* `ByteString`s, but can be any kind of
|
||||
`Value`.[^iri-labels]
|
||||
|
||||
[^iri-labels]: It is occasionally (but seldom) necessary to
|
||||
interpret such `ByteString` labels as UTF-8 encoded IRIs. Where a
|
||||
label can be read as a relative IRI, it is notionally interpreted
|
||||
with respect to the IRI `http://spki-cat.org/` ((TODO:
|
||||
placeholder)); where a label can be read as an absolute IRI, it
|
||||
stands for that IRI; and otherwise, it cannot be read as an IRI at
|
||||
all, and so the label simply stands for itself - for its own
|
||||
`Value`.
|
||||
|
||||
`Record`s are compared lexicographically as if they were just tuples;
|
||||
that is, first by their labels, and then by the remainder of their
|
||||
fields.
|
||||
|
||||
We will write examples of `Record`s with `ByteString` labels entirely
|
||||
composed of ASCII characters as their label followed by their
|
||||
parenthesised, comma-separated fields.
|
||||
|
||||
**Examples.** The `Record` with label `#"foo"` and fields 1, 2 and 3
|
||||
is written `#"foo"(1, 2, 3)`; the `Record` with label `#"void"` and no
|
||||
fields is written `#"void"()`.
|
||||
|
||||
## Conventions for Common Data Types
|
||||
|
||||
The `Value` data type is essentially an abstract S-Expression, able to
|
||||
represent semi-structured data over `ByteString` and `Number` atoms.
|
||||
|
||||
However, users need a wide variety of data types for representing
|
||||
domain-specific values such as text, calendrical values, machine
|
||||
words, IEEE 754 floating-point values, booleans, and so on.
|
||||
|
||||
We use appropriately-labelled `Record`s to denote these
|
||||
domain-specific data types.
|
||||
|
||||
All of these conventions are optional. They form a layer atop the core
|
||||
`Value` structure. Non-domain-specific tools do not in general need to
|
||||
treat them specially.
|
||||
|
||||
**Validity.** Many of the labels we will describe in this section come
|
||||
with side-conditions on the contents of labelled `Record`s. It is
|
||||
possible to construct an instance of `Value` that violates these
|
||||
side-conditions without ceasing to be a `Value` or becoming
|
||||
unrepresentable. However, we say that such a `Value` is *invalid*
|
||||
because it fails to honour the necessary side-conditions.
|
||||
Implementations *SHOULD* allow two modes of working: one which
|
||||
treats all `Value`s identically, without regard for side-conditions,
|
||||
and one which enforces validity (i.e. side-conditions) when reading,
|
||||
writing, or constructing `Value`s.
|
||||
|
||||
### Text
|
||||
|
||||
A `Text` is a `Record` labelled with the `ByteString` `#"utf-8"` and
|
||||
having a single field that is also a `ByteString`. The field *MUST* be
|
||||
valid UTF-8.
|
||||
|
||||
We will write examples of `Text`s that contain Unicode text using
|
||||
“`"`” as both an opening and closing quote mark.
|
||||
|
||||
**Examples.** The `Text` containing the three Unicode code points `z`
|
||||
(0x7A), `水` (0x6C34) and `𝄞` (0x1D11E) is written as `"z水𝄞"`.
|
||||
|
||||
**Normalization forms.** Unicode defines multiple
|
||||
[normalization forms](http://unicode.org/reports/tr15/) for text. The
|
||||
ordering and equivalence relations defined for `Value`s mean that, for
|
||||
Unicode text, the UTF-8 encoded byte-level form of a text is used in
|
||||
comparisons.[^utf8-is-awesome] In order for users to unambiguously
|
||||
signal or require a particular normalization form, we define a
|
||||
`NormalizedText`, which is a `Record` labelled with
|
||||
`#"unicode-normalization"` and having two fields, the first of which
|
||||
is a `Text` specifying the normalization form used (e.g. `"nfc"`,
|
||||
`"nfd"`, `"nfkc"`, `"nfkd"`), and the second of which is a `Text`
|
||||
whose underlying representation *MUST* be normalized according to the
|
||||
named normalization form.
|
||||
|
||||
[^utf8-is-awesome]: Happily, the design of UTF-8 is such that this
|
||||
gives the same result as a lexicographic code-point-by-code-point
|
||||
comparison!
|
||||
|
||||
**IRIs.** (URIs, URLs, URNs, etc.) An `IRI` is a `Record` labelled
|
||||
with `#"iri"` and having one field, a `Text` which is the IRI itself
|
||||
and which *MUST* be a valid absolute or relative IRI.
|
||||
|
||||
**Symbols.** Programming languages like Lisp and Prolog frequently use
|
||||
string-like values called *symbols*. A `Symbol` is a `Record` labelled
|
||||
with `#"symbol"` and having one field, a `Text`.
|
||||
|
||||
### Numbers
|
||||
|
||||
The definition of `Number` captures all integers and all
|
||||
finitely-representable floating-point values. However, in certain
|
||||
circumstances it can be valuable to assert that a number inhabits a
|
||||
particular range, such as a fixed-width machine word or an IEEE 754
|
||||
floating-point value.
|
||||
|
||||
**Fixed-width machine words.** (16-, 32- and 64-bit) A family of
|
||||
labels `i`*n* and `u`*n* for *n* ∈ {16,32,64} denote *n*-bit-wide
|
||||
signed and unsigned range restrictions, respectively. Records with
|
||||
these labels *MUST* have one field, a `Number`, which *MUST* fall
|
||||
within the appropriate range. That is, to be valid,
|
||||
- in `#"i16"(`*x*`)`, -32768 <= *x* <= 32767, and ⌊*x*⌋ = *x*.
|
||||
- in `#"u16"(`*x*`)`, 0 <= *x* <= 65535, and ⌊*x*⌋ = *x*.
|
||||
- in `#"i32"(`*x*`)`, -2147483648 <= *x* <= 2147483647, and ⌊*x*⌋ = *x*.
|
||||
- etc.
|
||||
|
||||
**IEEE 754 floating-point.** (single- and double-precision) The labels
|
||||
`f32` and `f64` denote single- and double-precision IEEE 754
|
||||
floating-point values, respectively. Records with these labels *MUST*
|
||||
have one field. This field *MUST* either be a `Number`, which *MUST*
|
||||
fall within the appropriate representable range, or one of the records
|
||||
`#"nan"()`, `#"+inf"()` or `#"-inf"()`.
|
||||
|
||||
### Anonymous Tuples and Unit
|
||||
|
||||
A `Tuple` is a `Record` with label `#"tuple"` and zero or more fields,
|
||||
denoting an anonymous tuple of values.
|
||||
|
||||
The 0-ary tuple, `#"tuple"()`, denotes the empty tuple, sometimes
|
||||
called "unit" or "void" (but *not* e.g. JavaScript's "undefined"
|
||||
value).
|
||||
|
||||
### Booleans, Null and Undefined
|
||||
|
||||
The two 0-ary `Record`s `#"true"()` and `#"false"()` denote the "true"
|
||||
and "false" Boolean values, respectively.
|
||||
|
||||
Tony Hoare's
|
||||
"[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)"
|
||||
can be represented with the 0-ary `Record` `#"null"()`. An "undefined"
|
||||
value can be represented as `#"undefined"()`.
|
||||
|
||||
### Dates and Times
|
||||
|
||||
Dates, times, moments, and timestamps can be represented with a
|
||||
`Record` with label `#"rfc3339"` having a single field, a `Text`,
|
||||
which *MUST* conform to one of the `full-date`, `partial-time`,
|
||||
`full-time`, or `date-time` productions of
|
||||
[section 5.6 of RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6).
|
||||
|
||||
## Syntax
|
||||
|
||||
Now we have discussed `Value`s and their meanings, we may turn to
|
||||
techniques for *representing* `Value`s for communication or storage.
|
||||
|
||||
The syntax we have used for the examples so far is inadequate in many
|
||||
ways, not least of which is that it cannot represent every `Value`.
|
||||
|
||||
Separation of the meaning of a piece of syntax from the syntax itself
|
||||
opens the door to domain-specific syntaxes, all equivalent and
|
||||
interconvertible.[^asn1] With a robust semantic foundation,
|
||||
connections to other data languages can also be made.
|
||||
|
||||
[^asn1]: Those who remember
|
||||
[ASN.1](https://www.itu.int/en/ITU-T/asn1/Pages/introduction.aspx)
|
||||
will recall BER, DER, PER, CER, XER and so on, each appropriate to
|
||||
a different setting. Similarly,
|
||||
[Rivest's S-Expression design][sexp.txt] offers a human-friendly
|
||||
syntax, a syntax robust to network-induced message corruption, and
|
||||
an unambiguous, simple and easily-parsed machine-friendly syntax
|
||||
for the same underlying values.
|
||||
|
||||
### Transfer syntax: S-Expressions
|
||||
|
||||
For now, we limit our attention to an easily-parsed, easily-produced
|
||||
machine-readable syntax by mapping our `Value`s to the canonical form
|
||||
of [Rivest's S-Expressions][sexp.txt].[^why-not-spki-sexps]
|
||||
|
||||
[^why-not-spki-sexps]: Why not just use Rivest's S-Expressions as
|
||||
they are? While they include binary data and sequences, and an
|
||||
obvious equivalence for them exists, they lack numbers *per se* as
|
||||
well as any kind of unordered structure such as sets or maps. In
|
||||
addition, while "display hints" allow labelling of binary data
|
||||
with an intended interpretation, they cannot be attached to any
|
||||
other kind of structure, and the "hint" itself can only be a
|
||||
binary blob.
|
||||
|
||||
#### Byte strings
|
||||
|
||||
`ByteString`s map to byte-string S-Expressions.
|
||||
|
||||
**Examples.**
|
||||
- What we have been writing above as `#"ABC"` would be represented as
|
||||
the S-Expression `3:ABC`.
|
||||
- The empty `ByteString` is represented by the S-Expression `0:`.
|
||||
|
||||
#### Numbers
|
||||
|
||||
Numbers are the most complicated values to represent as an
|
||||
S-Expression.
|
||||
|
||||
((TODO: Consider cutting complexity by e.g. representing a `Number` as
|
||||
a sign bit, a little-endian blob of the integer part of the number,
|
||||
and a little-endian blob of the fractional part of the number. Lots of
|
||||
trailing/leading zeros for very large/small numbers!))
|
||||
|
||||
We represent `Number`s using a sign-magnitude format, where the
|
||||
magnitude is written using a little-endian, twos-complement binary
|
||||
[*significand*](https://en.wikipedia.org/wiki/Significand) and a
|
||||
(signed) *shift amount*.
|
||||
|
||||
In essence, we use a generalized, variable-width form of binary IEEE
|
||||
floating-point representation.
|
||||
|
||||
Let `N` be the `Number` to represent as an S-Expression.
|
||||
|
||||
The sign bit is 0 when `N` is zero or positive, and 1 when `N` is
|
||||
negative.
|
||||
|
||||
The magnitude of `N` can be viewed as an infinite sequence of bits
|
||||
with a fraction-separator mark placed somewhere in the sequence,
|
||||
|
||||
```
|
||||
···00.000 b_0 b_1 ··· b_{k-1} b_k ··· b_{n-1} 000000···
|
||||
···000000 b_0 b_1 ··· b_{k-1} . b_k ··· b_{n-1} 000000···
|
||||
···000000 b_0 b_1 ··· b_{k-1} b_k ··· b_{n-1} 000.00···
|
||||
```
|
||||
|
||||
where `b_0` is the leftmost (most significant) and `b_{n-1}` the
|
||||
rightmost (least significant) non-zero bit.
|
||||
|
||||
Let `k`, the position of the fraction-separator mark, be `i` when it
|
||||
is immediately to the left of `b_i` for some `i`, generalizing to
|
||||
negative values when it is to the left of `b_0` and values greater
|
||||
than `n-1` when it is to the right of `b_{n-1}`.
|
||||
|
||||
For example, `k` will be:
|
||||
- 0 when the fraction-separator is immediately (i.e. zero bits) to the left of `b_0`;
|
||||
- -3 (as in the first example above) when it is three bits left of `b_0`;
|
||||
- `n` when it is immediately (i.e. zero bits) to the right of `b_{n-1}`;
|
||||
- `n`+3 when it is three bits to the right of `b_{n-1}`.
|
||||
|
||||
The unpadded significand is `b_0 b_1 ··· b_{n-1}`.
|
||||
|
||||
When `k` < `n`, the shift `z`=`k-n` and the significand is:
|
||||
- the unpadded significand,
|
||||
- with the sign bit appended to it on the right, and then
|
||||
- padded on the left with zeroes until it is a whole number of octets wide.
|
||||
|
||||
When `k` ≥ `n`, the shift `z`=`8×⌊(k-n)/8⌋` and the significand is:
|
||||
- the unpadded significand,
|
||||
- padded on the right with `(k-n) mod 8` zeroes,
|
||||
- with the sign bit then appended on the right, and then
|
||||
- padded on the left with zeroes until it is a whole number of octets wide.
|
||||
|
||||
Now, let `s`=2`z` if `z` is zero or positive, or `s`=2|`z`|+1 if `z`
|
||||
is negative.
|
||||
|
||||
Finally, the S-Expression form of `N` is:
|
||||
- `(4:*num [SIGNIFICAND] [SHIFT])`, if `s`≠0; or
|
||||
- `(4:*num [SIGNIFICAND])`, if `s`=0 but the significand contains non-zero bits; or
|
||||
- `(4:*num)`, if `s`=0 and the significand contains no non-zero bits;
|
||||
|
||||
where
|
||||
- `[SIGNIFICAND]` stands for a byte-string S-Expression containing a little-endian representation of the significand, and
|
||||
- `[SHIFT]` stands for a byte-string S-Expression containing a little-endian representation of `s`.
|
||||
|
||||
**Examples.** (Shown using the hexadecimal representation of
|
||||
byte-strings from
|
||||
[section 4.4 of Rivest's S-Expression specification][sexp.txt] in
|
||||
places.)
|
||||
- `N`=0 → `(4:*num)`
|
||||
- `N`=1 → `(4:*num#02#)`
|
||||
- `N`=-1 → `(4:*num#03#)`
|
||||
- `N`=10₁₀=1010.0₂ → `n`=3, `k`=4, `z`=0, `s`=0 → `(4:*num#14#)`
|
||||
- `N`=2560₁₀=101000000000.0₂ → `n`=3, `k`=12, `z`=8, `s`=16 → `(4:*num#14##10#)`
|
||||
- `N`=-2560₁₀=-101000000000.0₂ → `n`=3, `k`=12, `z`=8, `s`=16 → `(4:*num#15##10#)`
|
||||
- `N`=-6₁₀=-110.0₂ → `n`=2, `k`=3, `z`=0, `s`=0 → `(4:*num#0D#)`
|
||||
- `N`=0.5₁₀=0.1₂ → `n`=1, `k`=0, `z`=-1, `s`=3 → `(4:*num#02##03#)`
|
||||
- `N`=-3/2₁₀=-1.1₂ → `n`=2, `k`=1, `z`=-1, `s`=3 → `(4:*num#07##03#)`
|
||||
- `N`=33/192₁₀=0.001011₂ → `n`=4, `k`=-2, `z`=-6, `s`=7 → `(4:*num#16##07#)`
|
||||
- `N`=-1.202E4567=1011011001···000₂ (15172 binary digits, the last 4565 of which are zero) → `n`=10607, `k`=15172, `z`=4560, `s`=9120 → `(4:*num#41828E···24CD16##A023#)`
|
||||
|
||||
((TODO: figure out what this algorithm would actually look like in,
|
||||
say, C, Python and Racket.))
|
||||
|
||||
#### Lists
|
||||
|
||||
A `List` maps to an S-Expression list of representations of its
|
||||
elements, with the byte-string S-Expression `5:*list` prepended.
|
||||
|
||||
**Examples.**
|
||||
- The `List` containing the `ByteString`s `#"a"`, `#"b"`, and `#"c"`
|
||||
would be represented as the S-Expression `(5:*list1:a1:b1:c)`.
|
||||
- The empty `List` is represented by the S-Expression `(5:*list)`.
|
||||
|
||||
#### Maps
|
||||
|
||||
A `Map` is represented by an S-Expression list of representations of
|
||||
the `Map`'s key-value pairs, with the byte-string `4:*map` prepended.
|
||||
|
||||
Each key-value pair is represented by a two-element S-Expression list
|
||||
containing representations of the key and the value, in that order.
|
||||
|
||||
The key-value pairs *MUST* be ordered by `Value`-order of their keys.
|
||||
|
||||
**Examples.**
|
||||
- The `Map` containing entries mapping `#"a"` to `#"d"` and `#"c"` to
|
||||
`#"b"` is represented by `(4:*map(1:a1:d)(1:c1:b))`.
|
||||
- The `Map` containing an entry mapping the empty list to a "true"
|
||||
Boolean value is represented by `(4:*map((5:*list)(4:true)))`.
|
||||
- The empty `Map` is represented by `(4:*map)`.
|
||||
|
||||
**Non-examples.**
|
||||
- The S-Expression `(4:*map(1:c1:b)(1:a1:d))` is invalid, because its
|
||||
key-value pairs are not in `Value`-order by key: `#"c"` > `#"a"`.
|
||||
- The S-Expression `(4:*map1:a1:d1:c1:b)` is invalid, because its
|
||||
key-value pairs appear "flattened" in the outer list, rather than
|
||||
each appearing in a two-element list of its own.
|
||||
|
||||
#### Records
|
||||
|
||||
A `Record` is represented by an S-Expression list of its fields,
|
||||
prepended by:
|
||||
|
||||
- the representation of its label, if its label is a `ByteString` and
|
||||
does not begin with byte 42 (ASCII "`*`"); or
|
||||
- the S-Expression `1:*` followed by the representation of the
|
||||
`Record`'s label, otherwise.
|
||||
|
||||
**Examples.**
|
||||
- The `Text` `"hello-world"` is represented by the S-Expression
|
||||
`(5:utf-811:hello-world)`.
|
||||
- The `IRI` denoting `http://www.w3.org/` is represented by the
|
||||
S-Expression `(3:iri(5:utf-818:http://www.w3.org/))`.
|
||||
- The `Record` `#"*"()` is represented by the S-Expression
|
||||
`(1:*1:*)`.
|
||||
- The `Record` `#"*foo"(#"*bar")` is represented by the S-Expression
|
||||
`(1:*4:*foo4:*bar)`.
|
||||
- The `Record` with the empty list as its label and no fields is
|
||||
represented by the S-Expression `(1:*(5:*list))`.
|
||||
- `(7:rfc3339(5:utf-83:foo))` represents a well-formed `Value` that
|
||||
is a `Record` with `#"rfc3339"` as its label, and a single `Text`
|
||||
field. While it is a perfectly reasonable `Value`, it does *not*
|
||||
represent a valid date or time, since the `Text` `"foo"` does not
|
||||
conform to any of the RFC 3339 productions enumerated above.
|
||||
|
||||
**Non-examples.**
|
||||
- `((5:*list))` is not a representation of the `Record` with the
|
||||
empty list as its label and no fields, because that `Record` has a
|
||||
non-`ByteString` as its label, mandating a `1:*` prefix on its
|
||||
S-Expression representation.
|
||||
- `(4:*foo4:*bar)` does not represent the `Record`
|
||||
`#"*foo"(#"*bar")`, because the label `#"*foo"` begins with "`*`",
|
||||
mandating a `1:*` prefix on the `Record`'s S-Expression
|
||||
representation.
|
||||
|
||||
## Examples
|
||||
|
||||
((TODO: Give some examples of large and small SPKI-CAT documents,
|
||||
perhaps translated from various JSON blobs floating around the
|
||||
internet.))
|
||||
|
||||
## Representing Values in Programming Languages
|
||||
|
||||
We have given a definition of `Value` and its semantics, and proposed
|
||||
a concrete syntax for communicating and storing `Value`s. We now turn
|
||||
to **suggested** representations of `Value`s as *programming-language
|
||||
values* for various programming languages.
|
||||
|
||||
### JavaScript
|
||||
|
||||
- `ByteString` ↔ `Uint8Array`
|
||||
- `Number` ↔ numbers, problematically; bignums, perhaps; other?? TODO
|
||||
- `List` ↔ `Array`
|
||||
- `Map` ↔ `Object`
|
||||
- `Record` ↔ an instance of something like `Record` below, unless the label is...
|
||||
- `#"utf-8"` ↔ `String`
|
||||
- `#"true"` ↔ `true`
|
||||
- `#"false"` ↔ `false`
|
||||
- `#"null"` ↔ `null`
|
||||
- `#"undefined"` ↔ the undefined value
|
||||
- `#"rfc3339"` ↔ `Date`, if the `Record`'s field matches the `date-time` RFC 3339 production
|
||||
|
||||
```javascript
|
||||
function Record(label, ...fields) {
|
||||
this.label = label;
|
||||
this.fields = fields;
|
||||
}
|
||||
```
|
||||
|
||||
### Scheme/Racket
|
||||
|
||||
- `ByteString` ↔ byte vector (Racket: "Bytes")
|
||||
- `Number` ↔ numbers
|
||||
- `List` ↔ (where possible, immutable) list
|
||||
- `Map` ↔ hash-table
|
||||
- `Record` ↔ a structure (Racket: a "prefab struct"), unless the label is...
|
||||
- `#"utf-8"` ↔ a string
|
||||
- `#"true"` ↔ `#t`
|
||||
- `#"false"` ↔ `#f`
|
||||
- `#"symbol"` ↔ a symbol
|
||||
|
||||
### Java
|
||||
|
||||
- `ByteString` ↔ `byte[]`
|
||||
- `Number` ↔ numbers, problematically; bignums, perhaps; other?? TODO
|
||||
- `List` ↔ `java.util.List`
|
||||
- `Map` ↔ `java.util.Map`
|
||||
- `Record` ↔ an instance of something like `Record` below, unless the label is...
|
||||
- `#"utf-8"` ↔ `java.lang.String`
|
||||
- `#"true"` ↔ `java.lang.Boolean.TRUE`
|
||||
- `#"false"` ↔ `java.lang.Boolean.FALSE`
|
||||
- `#"null"` ↔ a special singleton object, but *not* Java's `null`
|
||||
- `#"rfc3339"` ↔ `java.util.{Date,Time,Timestamp}`, according to which RFC 3339 production the `Record`'s field matches
|
||||
|
||||
### Erlang
|
||||
|
||||
- `ByteString` ↔ a binary
|
||||
- `Number` ↔ numbers, probably; TODO
|
||||
- `List` ↔ a list
|
||||
- `Map` ↔ a [map](http://erlang.org/doc/reference_manual/data_types.html#id77432) (new in Erlang/OTP R17)
|
||||
- `Record` ↔ a tuple with the label in the first position, and the fields in subsequent positions, unless the label is...
|
||||
- `#"true"` ↔ `true`
|
||||
- `#"false"` ↔ `false`
|
||||
- `#"null"` ↔ `null`
|
||||
- `#"undefined"` ↔ `undefined`
|
||||
- `#"symbol"` ↔ the `Text` field converted to an Erlang atom, if
|
||||
some kind of an "unsafe" mode is set on the decoder (because
|
||||
Erlang atoms are not GC'd); otherwise like any other kind of
|
||||
`Record`
|
||||
|
||||
---
|
|
@ -1,61 +0,0 @@
|
|||
body {
|
||||
font-family: palatino, "Palatino Linotype", "Palatino LT STD", "URW Palladio L", "TeX Gyre Pagella", serif;
|
||||
}
|
||||
@media screen {
|
||||
body { padding-top: 2rem; max-width: 40em; margin: auto; font-size: 120%; }
|
||||
hr { display: none; }
|
||||
}
|
||||
@media print {
|
||||
@page { size: letter; margin: 4rem 0rem 4.333rem 0rem; }
|
||||
body { margin-left: 4.5rem; margin-right: 4.5rem; }
|
||||
h1, h2 { page-break-before: always; margin-top: 0; }
|
||||
h1:first-of-type, h2:first-of-type { page-break-before: auto; }
|
||||
hr+* { page-break-before: always; margin-top: 0; }
|
||||
hr { display: none; }
|
||||
}
|
||||
h1, h2, h3, h4, h5, h6 { color: #4f81bd; }
|
||||
h2 { border-bottom: solid #4f81bd 1px; }
|
||||
pre, code { background-color: #eee; font-family: "DejaVu Sans Mono", monospace; }
|
||||
code { font-size: 75%; }
|
||||
pre { padding: 0.33rem; }
|
||||
|
||||
body {
|
||||
counter-reset: section 0 subsection 0 appendix 0;
|
||||
}
|
||||
h2:before, h3:before {
|
||||
text-align: right;
|
||||
display: inline-block;
|
||||
position: relative;
|
||||
right: 2.33em;
|
||||
font-size: 75%;
|
||||
text-align: right;
|
||||
width: 2em;
|
||||
margin-right: -2em;
|
||||
height: 0;
|
||||
}
|
||||
h2:before {
|
||||
counter-increment: section;
|
||||
content: counter(section) ". ";
|
||||
}
|
||||
h2 {
|
||||
counter-reset: subsection 0;
|
||||
}
|
||||
h3:before {
|
||||
counter-increment: subsection;
|
||||
content: counter(section) "." counter(subsection) ". ";
|
||||
}
|
||||
|
||||
h2[id^="appendix-"]:before {
|
||||
counter-increment: appendix;
|
||||
content: counter(appendix,upper-latin) ". ";
|
||||
}
|
||||
h2[id^="appendix-"] ~ h3:before {
|
||||
counter-increment: subsection;
|
||||
content: counter(appendix,upper-latin) "." counter(subsection) ". ";
|
||||
}
|
||||
|
||||
h2#notes:before {
|
||||
display: none;
|
||||
}
|
||||
|
||||
.footnotes > ol { padding: 0; font-size: 90%; }
|
File diff suppressed because it is too large
Load Diff
|
@ -1,385 +0,0 @@
|
|||
import sys
|
||||
import numbers
|
||||
import struct
|
||||
|
||||
try:
|
||||
basestring
|
||||
except NameError:
|
||||
basestring = str
|
||||
|
||||
if isinstance(chr(123), bytes):
|
||||
_ord = ord
|
||||
else:
|
||||
_ord = lambda x: x
|
||||
|
||||
class Float(object):
|
||||
def __init__(self, value):
|
||||
self.value = value
|
||||
|
||||
def __eq__(self, other):
|
||||
if other.__class__ is self.__class__:
|
||||
return self.value == other.value
|
||||
|
||||
def __repr__(self):
|
||||
return 'Float(' + repr(self.value) + ')'
|
||||
|
||||
def __preserve_on__(self, encoder):
|
||||
encoder.leadbyte(0, 0, 2)
|
||||
encoder.buffer.extend(struct.pack('>f', self.value))
|
||||
|
||||
class Symbol(object):
|
||||
def __init__(self, name):
|
||||
self.name = name
|
||||
|
||||
def __eq__(self, other):
|
||||
return isinstance(other, Symbol) and self.name == other.name
|
||||
|
||||
def __hash__(self):
|
||||
return hash(self.name)
|
||||
|
||||
def __repr__(self):
|
||||
return '#' + self.name
|
||||
|
||||
def __preserve_on__(self, encoder):
|
||||
bs = self.name.encode('utf-8')
|
||||
encoder.header(1, 3, len(bs))
|
||||
encoder.buffer.extend(bs)
|
||||
|
||||
class Record(object):
|
||||
def __init__(self, key, fields):
|
||||
self.key = key
|
||||
self.fields = tuple(fields)
|
||||
self.__hash = None
|
||||
|
||||
def __eq__(self, other):
|
||||
return isinstance(other, Record) and (self.key, self.fields) == (other.key, other.fields)
|
||||
|
||||
def __hash__(self):
|
||||
if self.__hash is None:
|
||||
self.__hash = hash((self.key, self.fields))
|
||||
return self.__hash
|
||||
|
||||
def __repr__(self):
|
||||
return str(self.key) + '(' + ', '.join((repr(f) for f in self.fields)) + ')'
|
||||
|
||||
def __preserve_on__(self, encoder):
|
||||
try:
|
||||
index = encoder.shortForms.index(self.key)
|
||||
except ValueError:
|
||||
index = None
|
||||
if index is None:
|
||||
encoder.header(2, 3, len(self.fields) + 1)
|
||||
encoder.append(self.key)
|
||||
else:
|
||||
encoder.header(2, index, len(self.fields))
|
||||
for f in self.fields:
|
||||
encoder.append(f)
|
||||
|
||||
# Blub blub blub
|
||||
class ImmutableDict(dict):
|
||||
def __init__(self, *args, **kwargs):
|
||||
if hasattr(self, '__hash'): raise TypeError('Immutable')
|
||||
super(ImmutableDict, self).__init__(*args, **kwargs)
|
||||
self.__hash = None
|
||||
|
||||
def __delitem__(self, key): raise TypeError('Immutable')
|
||||
def __setitem__(self, key, val): raise TypeError('Immutable')
|
||||
def clear(self): raise TypeError('Immutable')
|
||||
def pop(self, k, d=None): raise TypeError('Immutable')
|
||||
def popitem(self): raise TypeError('Immutable')
|
||||
def setdefault(self, k, d=None): raise TypeError('Immutable')
|
||||
def update(self, e, **f): raise TypeError('Immutable')
|
||||
|
||||
def __hash__(self):
|
||||
if self.__hash is None:
|
||||
h = 0
|
||||
for k in self:
|
||||
h = ((h << 5) ^ (hash(k) << 2) ^ hash(self[k])) & sys.maxsize
|
||||
self.__hash = h
|
||||
return self.__hash
|
||||
|
||||
@staticmethod
|
||||
def from_kvs(kvs):
|
||||
i = iter(kvs)
|
||||
result = ImmutableDict()
|
||||
result_proxy = super(ImmutableDict, result)
|
||||
try:
|
||||
while True:
|
||||
k = next(i)
|
||||
v = next(i)
|
||||
result_proxy.__setitem__(k, v)
|
||||
except StopIteration:
|
||||
pass
|
||||
return result
|
||||
|
||||
def dict_kvs(d):
|
||||
for k in d:
|
||||
yield k
|
||||
yield d[k]
|
||||
|
||||
class DecodeError(ValueError): pass
|
||||
class EncodeError(ValueError): pass
|
||||
|
||||
class Codec(object):
|
||||
def __init__(self):
|
||||
self.shortForms = [Symbol(u'discard'), Symbol(u'capture'), Symbol(u'observe')]
|
||||
|
||||
def set_shortform(self, index, v):
|
||||
if index >= 0 and index < 3:
|
||||
self.shortForms[index] = v
|
||||
else:
|
||||
raise ValueError('Invalid short form index %r' % (index,))
|
||||
|
||||
class Stream(object):
|
||||
def __init__(self, iterator):
|
||||
self._iterator = iterator
|
||||
|
||||
def __preserve_on__(self, encoder):
|
||||
arg = (self.major << 2) | self.minor
|
||||
encoder.leadbyte(0, 2, arg)
|
||||
self._emit(encoder)
|
||||
encoder.leadbyte(0, 3, arg)
|
||||
|
||||
def _emit(self, encoder):
|
||||
raise NotImplementedError('Should be implemented in subclasses')
|
||||
|
||||
class ValueStream(Stream):
|
||||
major = 3
|
||||
def _emit(self, encoder):
|
||||
for v in self._iterator:
|
||||
encoder.append(v)
|
||||
|
||||
class SequenceStream(ValueStream):
|
||||
minor = 0
|
||||
|
||||
class SetStream(ValueStream):
|
||||
minor = 1
|
||||
|
||||
class DictStream(ValueStream):
|
||||
minor = 2
|
||||
def _emit(self, encoder):
|
||||
for (k, v) in self._iterator:
|
||||
encoder.append(k)
|
||||
encoder.append(v)
|
||||
|
||||
class BinaryStream(Stream):
|
||||
major = 1
|
||||
minor = 2
|
||||
def _emit(self, encoder):
|
||||
for chunk in self._iterator:
|
||||
if not isinstance(chunk, bytes):
|
||||
raise EncodeError('Illegal chunk in BinaryStream %r' % (chunk,))
|
||||
encoder.append(chunk)
|
||||
|
||||
class StringStream(BinaryStream):
|
||||
minor = 1
|
||||
|
||||
class SymbolStream(BinaryStream):
|
||||
minor = 3
|
||||
|
||||
class Decoder(Codec):
|
||||
def __init__(self, packet):
|
||||
super(Decoder, self).__init__()
|
||||
self.packet = packet
|
||||
self.index = 0
|
||||
|
||||
def peekbyte(self):
|
||||
if self.index < len(self.packet):
|
||||
return _ord(self.packet[self.index])
|
||||
else:
|
||||
raise DecodeError('Short packet')
|
||||
|
||||
def advance(self, count=1):
|
||||
start = self.index
|
||||
self.index = self.index + count
|
||||
return start
|
||||
|
||||
def nextbyte(self):
|
||||
val = self.peekbyte()
|
||||
self.advance()
|
||||
return val
|
||||
|
||||
def wirelength(self, arg):
|
||||
if arg < 15:
|
||||
return arg
|
||||
return self.varint()
|
||||
|
||||
def varint(self):
|
||||
v = self.nextbyte()
|
||||
if v < 128:
|
||||
return v
|
||||
else:
|
||||
return self.varint() * 128 + (v - 128)
|
||||
|
||||
def nextbytes(self, n):
|
||||
start = self.advance(n)
|
||||
return self.packet[start : self.index]
|
||||
|
||||
def nextvalues(self, n):
|
||||
result = []
|
||||
for i in range(n):
|
||||
result.append(self.next())
|
||||
return result
|
||||
|
||||
def peekop(self):
|
||||
b = self.peekbyte()
|
||||
major = b >> 6
|
||||
minor = (b >> 4) & 3
|
||||
arg = b & 15
|
||||
return (major, minor, arg)
|
||||
|
||||
def nextop(self):
|
||||
op = self.peekop()
|
||||
self.advance()
|
||||
return op
|
||||
|
||||
def peekend(self, arg):
|
||||
return self.peekop() == (0, 3, arg)
|
||||
|
||||
def binarystream(self, arg, minor):
|
||||
result = []
|
||||
while not self.peekend(arg):
|
||||
chunk = self.next()
|
||||
if isinstance(chunk, bytes):
|
||||
result.append(chunk)
|
||||
else:
|
||||
raise DecodeError('Unexpected non-binary chunk')
|
||||
return self.decodebinary(minor, b''.join(result))
|
||||
|
||||
def valuestream(self, arg, minor, decoder):
|
||||
result = []
|
||||
while not self.peekend(arg):
|
||||
result.append(self.next())
|
||||
return decoder(minor, result)
|
||||
|
||||
def decodeint(self, bs):
|
||||
if len(bs) == 0: return 0
|
||||
acc = _ord(bs[0])
|
||||
if acc & 0x80: acc = acc - 256
|
||||
for b in bs[1:]:
|
||||
acc = (acc << 8) | _ord(b)
|
||||
return acc
|
||||
|
||||
def decodebinary(self, minor, bs):
|
||||
if minor == 0: return self.decodeint(bs)
|
||||
if minor == 1: return bs.decode('utf-8')
|
||||
if minor == 2: return bs
|
||||
if minor == 3: return Symbol(bs.decode('utf-8'))
|
||||
|
||||
def decoderecord(self, minor, vs):
|
||||
if minor == 3:
|
||||
if not vs: raise DecodeError('Too few elements in encoded record')
|
||||
return Record(vs[0], vs[1:])
|
||||
else:
|
||||
return Record(self.shortForms[minor], vs)
|
||||
|
||||
def decodecollection(self, minor, vs):
|
||||
if minor == 0: return tuple(vs)
|
||||
if minor == 1: return frozenset(vs)
|
||||
if minor == 2: return ImmutableDict.from_kvs(vs)
|
||||
if minor == 3: raise DecodeError('Invalid collection type')
|
||||
|
||||
def next(self):
|
||||
(major, minor, arg) = self.nextop()
|
||||
if major == 0:
|
||||
if minor == 0:
|
||||
if arg == 0: return False
|
||||
if arg == 1: return True
|
||||
if arg == 2: return Float(struct.unpack('>f', self.nextbytes(4))[0])
|
||||
if arg == 3: return struct.unpack('>d', self.nextbytes(8))[0]
|
||||
raise DecodeError('Invalid format A encoding')
|
||||
elif minor == 1:
|
||||
return arg - 16 if arg > 12 else arg
|
||||
elif minor == 2:
|
||||
t = arg >> 2
|
||||
n = arg & 3
|
||||
if t == 0: raise DecodeError('Invalid format C start byte')
|
||||
if t == 1: return self.binarystream(arg, n)
|
||||
if t == 2: return self.valuestream(arg, n, self.decoderecord)
|
||||
if t == 3: return self.valuestream(arg, n, self.decodecollection)
|
||||
else: # minor == 3
|
||||
raise DecodeError('Unexpected format C end byte')
|
||||
elif major == 1:
|
||||
return self.decodebinary(minor, self.nextbytes(self.wirelength(arg)))
|
||||
elif major == 2:
|
||||
return self.decoderecord(minor, self.nextvalues(self.wirelength(arg)))
|
||||
else: # major == 3
|
||||
return self.decodecollection(minor, self.nextvalues(self.wirelength(arg)))
|
||||
|
||||
class Encoder(Codec):
|
||||
def __init__(self):
|
||||
super(Encoder, self).__init__()
|
||||
self.buffer = bytearray()
|
||||
|
||||
def contents(self):
|
||||
return bytes(self.buffer)
|
||||
|
||||
def varint(self, v):
|
||||
if v < 128:
|
||||
self.buffer.append(v)
|
||||
else:
|
||||
self.buffer.append((v % 128) + 128)
|
||||
self.varint(v // 128)
|
||||
|
||||
def leadbyte(self, major, minor, arg):
|
||||
self.buffer.append(((major & 3) << 6) | ((minor & 3) << 4) | (arg & 15))
|
||||
|
||||
def header(self, major, minor, wirelength):
|
||||
if wirelength < 15:
|
||||
self.leadbyte(major, minor, wirelength)
|
||||
else:
|
||||
self.leadbyte(major, minor, 15)
|
||||
self.varint(wirelength)
|
||||
|
||||
def encodeint(self, v):
|
||||
bitcount = (~v if v < 0 else v).bit_length() + 1
|
||||
bytecount = (bitcount + 7) // 8
|
||||
self.header(1, 0, bytecount)
|
||||
def enc(n,x):
|
||||
if n > 0:
|
||||
enc(n-1, x >> 8)
|
||||
self.buffer.append(x & 255)
|
||||
enc(bytecount, v)
|
||||
|
||||
def encodecollection(self, minor, items):
|
||||
self.header(3, minor, len(items))
|
||||
for i in items: self.append(i)
|
||||
|
||||
def append(self, v):
|
||||
if hasattr(v, '__preserve_on__'):
|
||||
v.__preserve_on__(self)
|
||||
elif v is False:
|
||||
self.leadbyte(0, 0, 0)
|
||||
elif v is True:
|
||||
self.leadbyte(0, 0, 1)
|
||||
elif isinstance(v, float):
|
||||
self.leadbyte(0, 0, 3)
|
||||
self.buffer.extend(struct.pack('>d', v))
|
||||
elif isinstance(v, numbers.Number):
|
||||
if v >= -3 and v <= 12:
|
||||
self.leadbyte(0, 1, v if v >= 0 else v + 16)
|
||||
else:
|
||||
self.encodeint(v)
|
||||
elif isinstance(v, bytes):
|
||||
self.header(1, 2, len(v))
|
||||
self.buffer.extend(v)
|
||||
elif isinstance(v, basestring):
|
||||
bs = v.encode('utf-8')
|
||||
self.header(1, 1, len(bs))
|
||||
self.buffer.extend(bs)
|
||||
elif isinstance(v, list):
|
||||
self.encodecollection(0, v)
|
||||
elif isinstance(v, tuple):
|
||||
self.encodecollection(0, v)
|
||||
elif isinstance(v, set):
|
||||
self.encodecollection(1, v)
|
||||
elif isinstance(v, frozenset):
|
||||
self.encodecollection(1, v)
|
||||
elif isinstance(v, dict):
|
||||
self.encodecollection(2, list(dict_kvs(v)))
|
||||
else:
|
||||
try:
|
||||
i = iter(v)
|
||||
except TypeError:
|
||||
raise EncodeError('Cannot encode %r' % (v,))
|
||||
self.encodestream(3, 0, i)
|
|
@ -1,856 +0,0 @@
|
|||
#lang racket/base
|
||||
;; Preserve, as in Fruit Preserve, as in a remarkably weak pun on pickling/dehydration etc
|
||||
|
||||
(provide (struct-out stream-of)
|
||||
(struct-out record)
|
||||
short-form-labels
|
||||
read-preserve
|
||||
string->preserve
|
||||
encode
|
||||
decode
|
||||
wire-value)
|
||||
|
||||
(require racket/bytes)
|
||||
(require racket/dict)
|
||||
(require racket/generator)
|
||||
(require racket/match)
|
||||
(require racket/set)
|
||||
(require bitsyntax)
|
||||
(require syndicate/support/struct)
|
||||
(require (only-in syntax/readerr raise-read-error))
|
||||
|
||||
(require imperative-syndicate/assertions)
|
||||
(require imperative-syndicate/pattern)
|
||||
|
||||
(struct stream-of (kind generator) #:transparent)
|
||||
|
||||
(struct record (label fields) #:transparent)
|
||||
|
||||
(define short-form-labels
|
||||
(make-parameter (vector 'discard 'capture 'observe)))
|
||||
|
||||
(define (encode v)
|
||||
(bit-string->bytes (bit-string (v :: (wire-value)))))
|
||||
|
||||
(define (decode bs [on-fail (lambda () (error 'decode "Invalid encoding: ~v" bs))])
|
||||
(bit-string-case bs
|
||||
([ (v :: (wire-value)) ] v)
|
||||
(else (on-fail))))
|
||||
|
||||
(define-syntax wire-value
|
||||
(syntax-rules ()
|
||||
[(_ #t input ks kf) (decode-value input ks kf)]
|
||||
[(_ #f v) (encode-value v)]))
|
||||
|
||||
(define-syntax wire-length
|
||||
(syntax-rules ()
|
||||
[(_ #t input ks kf) (decode-wire-length input ks kf)]
|
||||
[(_ #f v) (encode-wire-length v)]))
|
||||
|
||||
(define (encode-wire-length v)
|
||||
(when (negative? v) (error 'encode-wire-length "Cannot encode negative wire-length ~v" v))
|
||||
(if (< v #b1111)
|
||||
(bit-string (v :: bits 4))
|
||||
(bit-string (#b1111 :: bits 4) ((encode-varint v) :: binary))))
|
||||
|
||||
(define (encode-varint v)
|
||||
(if (< v 128)
|
||||
(bytes v)
|
||||
(bit-string ((+ (modulo v 128) 128) :: bits 8)
|
||||
((encode-varint (quotient v 128)) :: binary))))
|
||||
|
||||
(define (encode-array-like major minor fields)
|
||||
(bit-string (major :: bits 2)
|
||||
(minor :: bits 2)
|
||||
((length fields) :: (wire-length))
|
||||
((apply bit-string-append (map encode-value fields)) :: binary)))
|
||||
|
||||
(define (encode-binary-like major minor bs)
|
||||
(bit-string (major :: bits 2)
|
||||
(minor :: bits 2)
|
||||
((bytes-length bs) :: (wire-length))
|
||||
(bs :: binary)))
|
||||
|
||||
(define (encode-start-byte major minor)
|
||||
(bit-string (#b0010 :: bits 4) (major :: bits 2) (minor :: bits 2)))
|
||||
|
||||
(define (encode-end-byte major minor)
|
||||
(bit-string (#b0011 :: bits 4) (major :: bits 2) (minor :: bits 2)))
|
||||
|
||||
(define (encode-stream major minor chunk-ok? generator)
|
||||
(bit-string-append (encode-start-byte major minor)
|
||||
(let loop ()
|
||||
(match (generator)
|
||||
[(? void?) #""]
|
||||
[(? chunk-ok? v) (bit-string-append (encode-value v) (loop))]
|
||||
[bad (error 'encode-stream "Cannot encode chunk: ~v" bad)]))
|
||||
(encode-end-byte major minor)))
|
||||
|
||||
(define (dict-keys-and-values d)
|
||||
(reverse (for/fold [(acc '())] [((k v) (in-dict d))] (cons v (cons k acc)))))
|
||||
|
||||
(define (short-form-for-label key)
|
||||
(let ((labels (short-form-labels)))
|
||||
(let loop ((i 0))
|
||||
(cond [(= i 3) #f]
|
||||
[(equal? (vector-ref labels i) key) i]
|
||||
[else (loop (+ i 1))]))))
|
||||
|
||||
(define (encode-record key fields)
|
||||
(define short (short-form-for-label key))
|
||||
(if short
|
||||
(encode-array-like 2 short fields)
|
||||
(encode-array-like 2 3 (cons key fields))))
|
||||
|
||||
(define (encode-value v)
|
||||
(match v
|
||||
[#f (bytes #b00000000)]
|
||||
[#t (bytes #b00000001)]
|
||||
[(? single-flonum?) (bit-string #b00000010 (v :: float bits 32))]
|
||||
[(? double-flonum?) (bit-string #b00000011 (v :: float bits 64))]
|
||||
[(? integer? x) #:when (<= -3 x 12) (bit-string (#b0001 :: bits 4) (x :: bits 4))]
|
||||
[(stream-of 'string p) (encode-stream 1 1 bytes? p)]
|
||||
[(stream-of 'byte-string p) (encode-stream 1 2 bytes? p)]
|
||||
[(stream-of 'symbol p) (encode-stream 1 3 bytes? p)]
|
||||
[(stream-of 'sequence p) (encode-stream 3 0 (lambda (x) #t) p)]
|
||||
[(stream-of 'set p) (encode-stream 3 1 (lambda (x) #t) p)]
|
||||
[(stream-of 'dictionary p) (encode-stream 3 2 (lambda (x) #t) p)]
|
||||
|
||||
;; [0 (bytes #b10000000)]
|
||||
[(? integer?)
|
||||
(define raw-bit-count (+ (integer-length v) 1)) ;; at least one sign bit
|
||||
(define byte-count (quotient (+ raw-bit-count 7) 8))
|
||||
(bit-string (#b0100 :: bits 4) (byte-count :: (wire-length)) (v :: integer bytes byte-count))]
|
||||
[(? string?) (encode-binary-like 1 1 (string->bytes/utf-8 v))]
|
||||
[(? bytes?) (encode-binary-like 1 2 v)]
|
||||
[(? symbol?) (encode-binary-like 1 3 (string->bytes/utf-8 (symbol->string v)))]
|
||||
|
||||
[(record label fields) (encode-record label fields)]
|
||||
[(? non-object-struct?)
|
||||
(define key (prefab-struct-key v))
|
||||
(when (not key) (error 'encode-value "Cannot encode non-prefab struct ~v" v))
|
||||
(encode-record key (cdr (vector->list (struct->vector v))))]
|
||||
|
||||
[(? list?) (encode-array-like 3 0 v)]
|
||||
[(? set?) (encode-array-like 3 1 (set->list v))]
|
||||
[(? dict?) (encode-array-like 3 2 (dict-keys-and-values v))]
|
||||
|
||||
[_ (error 'encode-value "Cannot encode value ~v" v)]))
|
||||
|
||||
;;---------------------------------------------------------------------------
|
||||
|
||||
(define (decode-wire-length bs ks kf)
|
||||
(bit-string-case bs
|
||||
([ (= #b1111 :: bits 4) (rest :: binary) ]
|
||||
(decode-varint rest
|
||||
(lambda (v tail)
|
||||
(if (< v #b1111)
|
||||
(kf)
|
||||
(ks v tail)))
|
||||
kf))
|
||||
([ (v :: bits 4) (rest :: binary) ] (ks v rest))
|
||||
(else (kf))))
|
||||
|
||||
(define (decode-varint bs ks kf)
|
||||
(bit-string-case bs
|
||||
([ (= 1 :: bits 1) (v :: bits 7) (rest :: binary) ]
|
||||
(decode-varint rest (lambda (acc tail) (ks (+ (* acc 128) v) tail)) kf))
|
||||
([ (= 0 :: bits 1) (v :: bits 7) (rest :: binary) ]
|
||||
(ks v rest))
|
||||
(else
|
||||
(kf))))
|
||||
|
||||
(define (decode-values n acc-rev bs ks kf)
|
||||
(if (zero? n)
|
||||
(ks (reverse acc-rev) bs)
|
||||
(bit-string-case bs
|
||||
([ (v :: (wire-value)) (rest :: binary) ]
|
||||
(decode-values (- n 1) (cons v acc-rev) rest ks kf))
|
||||
(else (kf)))))
|
||||
|
||||
(define (decode-binary minor bs rest ks kf)
|
||||
(match minor
|
||||
[0 (if (positive? (bit-string-length bs))
|
||||
(ks (bit-string->signed-integer bs #t) rest)
|
||||
(ks 0 rest))]
|
||||
[2 (ks bs rest)]
|
||||
[(or 1 3)
|
||||
((with-handlers [(exn:fail:contract? (lambda (e) kf))]
|
||||
(define s (bytes->string/utf-8 bs))
|
||||
(lambda () (ks (if (= minor 3) (string->symbol s) s) rest))))]))
|
||||
|
||||
(define (build-record label fields)
|
||||
(with-handlers [(exn:fail:contract? (lambda (e) (record label fields)))]
|
||||
(apply make-prefab-struct label fields)))
|
||||
|
||||
(define (decode-record minor fields rest ks kf)
|
||||
(match* (minor fields)
|
||||
[(3 (list* key fs)) (ks (build-record key fs) rest)]
|
||||
[(3 '()) (kf)]
|
||||
[(n fs) (ks (build-record (vector-ref (short-form-labels) n) fs) rest)]))
|
||||
|
||||
(define (decode-collection minor vs rest ks kf)
|
||||
(match minor
|
||||
[0 (ks vs rest)]
|
||||
[1 (ks (list->set vs) rest)]
|
||||
[2 (if (even? (length vs))
|
||||
(ks (apply hash vs) rest)
|
||||
(kf))]
|
||||
[_ (kf)]))
|
||||
|
||||
(define (decode-stream major minor chunk-ok? join-chunks decode rest ks kf)
|
||||
(let loop ((acc-rev '()) (rest rest))
|
||||
(bit-string-case rest
|
||||
([ (= #b0011 :: bits 4) (emajor :: bits 2) (eminor :: bits 2) (rest :: binary) ]
|
||||
(if (and (= major emajor) (= minor eminor))
|
||||
(decode minor (join-chunks (reverse acc-rev)) rest ks kf)
|
||||
(kf)))
|
||||
(else
|
||||
(decode-value rest
|
||||
(lambda (chunk rest)
|
||||
(if (chunk-ok? chunk)
|
||||
(loop (cons chunk acc-rev) rest)
|
||||
(kf)))
|
||||
kf)))))
|
||||
|
||||
(define (decode-value bs ks kf)
|
||||
(bit-string-case bs
|
||||
([ (= #b00000000 :: bits 8) (rest :: binary) ] (ks #f rest))
|
||||
([ (= #b00000001 :: bits 8) (rest :: binary) ] (ks #t rest))
|
||||
([ (= #b00000010 :: bits 8) (v :: float bits 32) (rest :: binary) ] (ks (real->single-flonum v) rest))
|
||||
([ (= #b00000011 :: bits 8) (v :: float bits 64) (rest :: binary) ] (ks v rest))
|
||||
([ (= #b0001 :: bits 4) (x :: bits 4) (rest :: binary) ] (ks (if (> x 12) (- x 16) x) rest))
|
||||
|
||||
([ (= #b001001 :: bits 6) (minor :: bits 2) (rest :: binary) ]
|
||||
(decode-stream 1 minor bytes? bytes-append* decode-binary rest ks kf))
|
||||
([ (= #b001010 :: bits 6) (minor :: bits 2) (rest :: binary) ]
|
||||
(decode-stream 2 minor (lambda (x) #t) values decode-record rest ks kf))
|
||||
([ (= #b001011 :: bits 6) (minor :: bits 2) (rest :: binary) ]
|
||||
(decode-stream 3 minor (lambda (x) #t) values decode-collection rest ks kf))
|
||||
|
||||
([ (= #b01 :: bits 2) (minor :: bits 2) (byte-count :: (wire-length))
|
||||
(bits :: binary bytes byte-count)
|
||||
(rest :: binary) ]
|
||||
(decode-binary minor (bit-string->bytes bits) rest ks kf))
|
||||
|
||||
([ (= #b10 :: bits 2) (minor :: bits 2) (field-count :: (wire-length)) (rest :: binary) ]
|
||||
(decode-values field-count '() rest
|
||||
(lambda (fields rest) (decode-record minor fields rest ks kf))
|
||||
kf))
|
||||
|
||||
([ (= #b11 :: bits 2) (minor :: bits 2) (count :: (wire-length)) (rest :: binary) ]
|
||||
(decode-values count '() rest
|
||||
(lambda (vs rest) (decode-collection minor vs rest ks kf))
|
||||
kf))
|
||||
|
||||
(else (kf))))
|
||||
|
||||
;;---------------------------------------------------------------------------
|
||||
|
||||
(define (skip-whitespace* i)
|
||||
(regexp-match? #px#"^(\\s|,)*" i)
|
||||
(match (peek-char i)
|
||||
[#\; (regexp-match? #px#"[^\r\n]*[\r\n]" i) (skip-whitespace* i)]
|
||||
[_ #t]))
|
||||
|
||||
(define (parse-error* i fmt . args)
|
||||
(define-values [line column pos] (port-next-location i))
|
||||
(raise-read-error (format "read-preserve: ~a" (apply format fmt args))
|
||||
(object-name i)
|
||||
line
|
||||
column
|
||||
pos
|
||||
#f))
|
||||
|
||||
(define (read-preserve [i (current-input-port)])
|
||||
(local-require net/base64)
|
||||
(local-require file/sha1)
|
||||
|
||||
(define-match-expander px
|
||||
(syntax-rules ()
|
||||
[(_ re pat) (app (lambda (v) (regexp-try-match re v)) pat)]))
|
||||
|
||||
(define (parse-error fmt . args)
|
||||
(apply parse-error* i fmt args))
|
||||
|
||||
(define (eof-guard ch)
|
||||
(match ch
|
||||
[(? eof-object?) (parse-error "Unexpected end of input")]
|
||||
[ch ch]))
|
||||
|
||||
(define (peek/no-eof) (eof-guard (peek-char i)))
|
||||
(define (read/no-eof) (eof-guard (read-char i)))
|
||||
|
||||
(define (skip-whitespace) (skip-whitespace* i))
|
||||
|
||||
(define (read-sequence terminator)
|
||||
(sequence-fold '() (lambda (acc) (cons (read-value) acc)) reverse terminator))
|
||||
|
||||
(define (read-dictionary-or-set)
|
||||
(sequence-fold #f
|
||||
(lambda (acc)
|
||||
(define k (read-value))
|
||||
(skip-whitespace)
|
||||
(match (peek-char i)
|
||||
[#\: (when (set? acc) (parse-error "Unexpected key/value separator in set"))
|
||||
(read-char i)
|
||||
(define v (read-value))
|
||||
(hash-set (or acc (hash)) k v)]
|
||||
[_ (when (hash? acc) (parse-error "Missing expected key/value separator"))
|
||||
(set-add (or acc (set)) k)]))
|
||||
(lambda (acc) (or acc (hash)))
|
||||
#\}))
|
||||
|
||||
(define PIPE #\|)
|
||||
|
||||
(define (read-raw-symbol acc)
|
||||
(match (peek-char i)
|
||||
[(or (? eof-object?)
|
||||
(? char? (or #\( #\) #\{ #\} #\[ #\]
|
||||
#\" #\; #\, #\# #\: (== PIPE)
|
||||
(? char-whitespace?))))
|
||||
(string->symbol (list->string (reverse acc)))]
|
||||
[_ (read-raw-symbol (cons (read-char i) acc))]))
|
||||
|
||||
(define (read-base64-binary acc)
|
||||
(skip-whitespace)
|
||||
(define ch (read/no-eof))
|
||||
(cond [(eqv? ch #\})
|
||||
(base64-decode (string->bytes/latin-1 (list->string (reverse acc))))]
|
||||
[(or (and (char>=? ch #\A) (char<=? ch #\Z))
|
||||
(and (char>=? ch #\a) (char<=? ch #\z))
|
||||
(and (char>=? ch #\0) (char<=? ch #\9))
|
||||
(memv ch '(#\+ #\/ #\- #\_ #\=)))
|
||||
(read-base64-binary (cons ch acc))]
|
||||
[else
|
||||
(parse-error "Invalid base64 character")]))
|
||||
|
||||
(define (hexdigit? ch)
|
||||
(or (and (char>=? ch #\A) (char<=? ch #\F))
|
||||
(and (char>=? ch #\a) (char<=? ch #\f))
|
||||
(and (char>=? ch #\0) (char<=? ch #\9))))
|
||||
|
||||
(define (read-hex-binary acc)
|
||||
(skip-whitespace)
|
||||
(define ch (read/no-eof))
|
||||
(cond [(eqv? ch #\})
|
||||
(hex-string->bytes (list->string (reverse acc)))]
|
||||
[(hexdigit? ch)
|
||||
(define ch2 (read/no-eof))
|
||||
(when (not (hexdigit? ch2))
|
||||
(parse-error "Hex-encoded binary digits must come in pairs"))
|
||||
(read-hex-binary (cons ch2 (cons ch acc)))]
|
||||
[else
|
||||
(parse-error "Invalid hex character")]))
|
||||
|
||||
(define (read-stringlike xform-item finish terminator-char hexescape-char hexescape-proc)
|
||||
(let loop ((acc '()))
|
||||
(match (read/no-eof)
|
||||
[(== terminator-char) (finish (reverse acc))]
|
||||
[#\\ (match (read/no-eof)
|
||||
[(== hexescape-char) (loop (cons (hexescape-proc) acc))]
|
||||
[(and c (or (== terminator-char) #\\ #\/)) (loop (cons (xform-item c) acc))]
|
||||
[#\b (loop (cons (xform-item #\u08) acc))]
|
||||
[#\f (loop (cons (xform-item #\u0C) acc))]
|
||||
[#\n (loop (cons (xform-item #\u0A) acc))]
|
||||
[#\r (loop (cons (xform-item #\u0D) acc))]
|
||||
[#\t (loop (cons (xform-item #\u09) acc))]
|
||||
[c (parse-error "Invalid escape code \\~a" c)])]
|
||||
[c (loop (cons (xform-item c) acc))])))
|
||||
|
||||
(define (read-string terminator-char)
|
||||
(read-stringlike values
|
||||
list->string
|
||||
terminator-char
|
||||
#\u
|
||||
(lambda ()
|
||||
(integer->char
|
||||
(match i
|
||||
[(px #px#"^[a-fA-F0-9]{4}" (list hexdigits))
|
||||
(define n1 (string->number (bytes->string/utf-8 hexdigits) 16))
|
||||
(if (<= #xd800 n1 #xdfff) ;; surrogate pair first half
|
||||
(match i
|
||||
[(px #px#"^\\\\u([a-fA-F0-9]{4})" (list _ hexdigits2))
|
||||
(define n2 (string->number (bytes->string/utf-8 hexdigits2) 16))
|
||||
(if (<= #xdc00 n2 #xdfff)
|
||||
(+ (arithmetic-shift (- n1 #xd800) 10)
|
||||
(- n2 #xdc00)
|
||||
#x10000)
|
||||
(parse-error "Bad second half of surrogate pair"))]
|
||||
[_ (parse-error "Missing second half of surrogate pair")])
|
||||
n1)]
|
||||
[_ (parse-error "Bad string \\u escape")])))))
|
||||
|
||||
(define (read-literal-binary)
|
||||
(read-stringlike (lambda (c)
|
||||
(define b (char->integer c))
|
||||
(when (>= b 256)
|
||||
(parse-error "Invalid code point ~a (~v) in literal binary" b c))
|
||||
b)
|
||||
list->bytes
|
||||
#\"
|
||||
#\x
|
||||
(lambda ()
|
||||
(match i
|
||||
[(px #px#"^[a-fA-F0-9]{2}" (list hexdigits))
|
||||
(string->number (bytes->string/utf-8 hexdigits) 16)]
|
||||
[_ (parse-error "Bad binary \\x escape")]))))
|
||||
|
||||
(define (read-intpart acc-rev)
|
||||
(match (peek-char i)
|
||||
[#\0 (read-fracexp (cons (read-char i) acc-rev))]
|
||||
[_ (read-digit+ acc-rev read-fracexp)]))
|
||||
|
||||
(define (read-digit* acc-rev k)
|
||||
(match (peek-char i)
|
||||
[(? char? (? char-numeric?)) (read-digit* (cons (read-char i) acc-rev) k)]
|
||||
[_ (k acc-rev)]))
|
||||
|
||||
(define (read-digit+ acc-rev k)
|
||||
(match (peek-char i)
|
||||
[(? char? (? char-numeric?)) (read-digit* (cons (read-char i) acc-rev) k)]
|
||||
[_ (parse-error "Incomplete number")]))
|
||||
|
||||
(define (read-fracexp acc-rev)
|
||||
(match (peek-char i)
|
||||
[#\. (read-digit+ (cons (read-char i) acc-rev) read-exp)]
|
||||
[_ (read-exp acc-rev)]))
|
||||
|
||||
(define (read-exp acc-rev)
|
||||
(match (peek-char i)
|
||||
[(or #\e #\E) (read-sign-and-exp (cons (read-char i) acc-rev))]
|
||||
[_ (finish-number acc-rev)]))
|
||||
|
||||
(define (read-sign-and-exp acc-rev)
|
||||
(match (peek-char i)
|
||||
[(or #\+ #\-) (read-digit+ (cons (read-char i) acc-rev) finish-number)]
|
||||
[_ (read-digit+ acc-rev finish-number)]))
|
||||
|
||||
(define (finish-number acc-rev)
|
||||
(define s (list->string (reverse acc-rev)))
|
||||
(define n (string->number s))
|
||||
(when (not n) (parse-error "Invalid number: ~v" s))
|
||||
(if (flonum? n)
|
||||
(match (peek-char i)
|
||||
[(or #\f #\F) (read-char i) (real->single-flonum n)]
|
||||
[_ n])
|
||||
n))
|
||||
|
||||
(define (read-number)
|
||||
(match (peek/no-eof)
|
||||
[#\- (read-intpart (list (read-char i)))]
|
||||
[_ (read-intpart (list))]))
|
||||
|
||||
(define (sequence-fold acc accumulate-one finish terminator-char)
|
||||
(let loop ((acc acc))
|
||||
(skip-whitespace)
|
||||
(match (peek/no-eof)
|
||||
[(== terminator-char) (read-char i) (finish acc)]
|
||||
[_ (loop (accumulate-one acc))])))
|
||||
|
||||
(define (collect-fields head)
|
||||
(match (peek-char i)
|
||||
[#\(
|
||||
(read-char i)
|
||||
(collect-fields (build-record head (read-sequence #\))))]
|
||||
[_
|
||||
head]))
|
||||
|
||||
(define (read-value)
|
||||
(skip-whitespace)
|
||||
(collect-fields
|
||||
(match (peek-char i)
|
||||
[(? eof-object? o) o]
|
||||
[#\{ (read-char i) (read-dictionary-or-set)]
|
||||
[#\[ (read-char i) (read-sequence #\])]
|
||||
[(or #\- #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9) (read-number)]
|
||||
[#\" (read-char i) (read-string #\")]
|
||||
[(== PIPE) (read-char i) (string->symbol (read-string PIPE))]
|
||||
[#\# (match i
|
||||
[(px #px#"^#set\\{" (list _))
|
||||
(sequence-fold (set) (lambda (acc) (set-add acc (read-value))) values #\})]
|
||||
[(px #px#"^#hexvalue\\{" (list _))
|
||||
(decode (read-hex-binary '()) (lambda () (parse-error "Invalid #hexvalue encoding")))]
|
||||
[(px #px#"^#true" (list _))
|
||||
#t]
|
||||
[(px #px#"^#false" (list _))
|
||||
#f]
|
||||
[(px #px#"^#\"" (list _))
|
||||
(read-literal-binary)]
|
||||
[(px #px#"^#hex\\{" (list _))
|
||||
(read-hex-binary '())]
|
||||
[(px #px#"^#base64\\{" (list _))
|
||||
(read-base64-binary '())]
|
||||
[_
|
||||
(parse-error "Invalid preserve value")])]
|
||||
[#\: (parse-error "Unexpected key/value separator between items")]
|
||||
[_ (read-raw-symbol '())])))
|
||||
|
||||
(read-value))
|
||||
|
||||
(define (string->preserve s)
|
||||
(define p (open-input-string s))
|
||||
(define v (read-preserve p))
|
||||
(skip-whitespace* p)
|
||||
(when (not (eof-object? (peek-char p)))
|
||||
(parse-error* p "Unexpected text following preserve"))
|
||||
v)
|
||||
|
||||
;;---------------------------------------------------------------------------
|
||||
|
||||
(module+ test
|
||||
(require rackunit)
|
||||
(require (for-syntax racket syntax/srcloc))
|
||||
|
||||
(check-equal? (bit-string->bytes (encode-varint 0)) (bytes 0))
|
||||
(check-equal? (bit-string->bytes (encode-varint 1)) (bytes 1))
|
||||
(check-equal? (bit-string->bytes (encode-varint 127)) (bytes 127))
|
||||
(check-equal? (bit-string->bytes (encode-varint 128)) (bytes 128 1))
|
||||
(check-equal? (bit-string->bytes (encode-varint 255)) (bytes 255 1))
|
||||
(check-equal? (bit-string->bytes (encode-varint 256)) (bytes 128 2))
|
||||
(check-equal? (bit-string->bytes (encode-varint 300)) (bytes #b10101100 #b00000010))
|
||||
(check-equal? (bit-string->bytes (encode-varint 1000000000)) (bytes 128 148 235 220 3))
|
||||
|
||||
(define (ks* v rest) (list v (bit-string->bytes rest)))
|
||||
(define (kf*) (void))
|
||||
|
||||
(check-equal? (decode-varint (bytes 0) ks* kf*) (list 0 (bytes)))
|
||||
(check-equal? (decode-varint (bytes 0 99) ks* kf*) (list 0 (bytes 99)))
|
||||
(check-equal? (decode-varint (bytes 1) ks* kf*) (list 1 (bytes)))
|
||||
(check-equal? (decode-varint (bytes 127) ks* kf*) (list 127 (bytes)))
|
||||
(check-equal? (decode-varint (bytes 128 1) ks* kf*) (list 128 (bytes)))
|
||||
(check-equal? (decode-varint (bytes 128 1 99) ks* kf*) (list 128 (bytes 99)))
|
||||
(check-equal? (decode-varint (bytes 255 1) ks* kf*) (list 255 (bytes)))
|
||||
(check-equal? (decode-varint (bytes 128 2) ks* kf*) (list 256 (bytes)))
|
||||
(check-equal? (decode-varint (bytes #b10101100 #b00000010) ks* kf*) (list 300 (bytes)))
|
||||
(check-equal? (decode-varint (bytes 128 148 235 220 3) ks* kf*) (list 1000000000 (bytes)))
|
||||
(check-equal? (decode-varint (bytes 128 148 235 220 3 99) ks* kf*) (list 1000000000 (bytes 99)))
|
||||
|
||||
(check-equal? (bit-string->bytes (bit-string (0 :: bits 4) (0 :: (wire-length)))) (bytes 0))
|
||||
(check-equal? (bit-string->bytes (bit-string (0 :: bits 4) (3 :: (wire-length)))) (bytes 3))
|
||||
(check-equal? (bit-string->bytes (bit-string (0 :: bits 4) (14 :: (wire-length)))) (bytes 14))
|
||||
(check-equal? (bit-string->bytes (bit-string (0 :: bits 4) (15 :: (wire-length)))) (bytes 15 15))
|
||||
(check-equal? (bit-string->bytes (bit-string (0 :: bits 4) (100 :: (wire-length))))
|
||||
(bytes 15 100))
|
||||
(check-equal? (bit-string->bytes (bit-string (0 :: bits 4) (300 :: (wire-length))))
|
||||
(bytes 15 #b10101100 #b00000010))
|
||||
|
||||
(define (dwl bs)
|
||||
(bit-string-case bs
|
||||
([ (= 0 :: bits 4) (w :: (wire-length)) ] w)
|
||||
(else (void))))
|
||||
|
||||
(check-equal? (dwl (bytes 0)) 0)
|
||||
(check-equal? (dwl (bytes 3)) 3)
|
||||
(check-equal? (dwl (bytes 14)) 14)
|
||||
(check-equal? (dwl (bytes 15)) (void))
|
||||
(check-equal? (dwl (bytes 15 9)) (void)) ;; not canonical
|
||||
(check-equal? (dwl (bytes 15 15)) 15)
|
||||
(check-equal? (dwl (bytes 15 100)) 100)
|
||||
(check-equal? (dwl (bytes 15 #b10101100 #b00000010)) 300)
|
||||
|
||||
(struct speak (who what) #:prefab)
|
||||
|
||||
(define (expected . pieces)
|
||||
(bit-string->bytes
|
||||
(apply bit-string-append
|
||||
(map (match-lambda
|
||||
[(? byte? b) (bytes b)]
|
||||
[(? bytes? bs) bs]
|
||||
[(? string? s) (string->bytes/utf-8 s)])
|
||||
pieces))))
|
||||
|
||||
(define (d bs) (decode bs void))
|
||||
|
||||
(define-syntax (cross-check stx)
|
||||
(syntax-case stx ()
|
||||
((_ text v (b ...))
|
||||
#'(let ((val v)) (cross-check text v v (b ...))))
|
||||
((_ text forward back (b ...))
|
||||
#`(let ((loc #,(source-location->string #'forward)))
|
||||
(check-equal? (string->preserve text) back loc)
|
||||
(check-equal? (d (encode forward)) back loc)
|
||||
(check-equal? (d (encode back)) back loc)
|
||||
(check-equal? (d (expected b ...)) back loc)
|
||||
(check-equal? (encode forward) (expected b ...) loc)
|
||||
))))
|
||||
|
||||
(define-syntax (cross-check/nondeterministic stx)
|
||||
(syntax-case stx ()
|
||||
((_ text v (b ...))
|
||||
#'(let ((val v)) (cross-check/nondeterministic text v v (b ...))))
|
||||
((_ text forward back (b ...))
|
||||
#`(let ((loc #,(source-location->string #'forward)))
|
||||
(check-equal? (string->preserve text) back loc)
|
||||
(check-equal? (d (encode forward)) back loc)
|
||||
(check-equal? (d (encode back)) back loc)
|
||||
(check-equal? (d (expected b ...)) back loc)
|
||||
))))
|
||||
|
||||
(cross-check "capture(discard())" (capture (discard)) (#x91 #x80))
|
||||
(cross-check "observe(speak(discard(), capture(discard())))"
|
||||
(observe (speak (discard) (capture (discard))))
|
||||
(#xA1 #xB3 #x75 "speak" #x80 #x91 #x80))
|
||||
(cross-check "[1, 2, 3, 4]" '(1 2 3 4) (#xC4 #x11 #x12 #x13 #x14))
|
||||
(cross-check "[1 2 3 4]"
|
||||
(stream-of 'sequence (sequence->generator '(1 2 3 4)))
|
||||
'(1 2 3 4)
|
||||
(#x2C #x11 #x12 #x13 #x14 #x3C))
|
||||
(cross-check " [ -2 -1 0 1 ] " '(-2 -1 0 1) (#xC4 #x1E #x1F #x10 #x11))
|
||||
(cross-check "\"hello\"" "hello" (#x55 "hello"))
|
||||
(cross-check "\"hello\""
|
||||
(stream-of 'string (sequence->generator '(#"he" #"llo")))
|
||||
"hello"
|
||||
(#x25 #x62 "he" #x63 "llo" #x35))
|
||||
(cross-check "\"hello\""
|
||||
(stream-of 'string (sequence->generator '(#"he" #"ll" #"" #"" #"o")))
|
||||
"hello"
|
||||
(#x25 #x62 "he" #x62 "ll" #x60 #x60 #x61 "o" #x35))
|
||||
(cross-check "#\"hello\""
|
||||
(stream-of 'byte-string (sequence->generator '(#"he" #"ll" #"" #"" #"o")))
|
||||
#"hello"
|
||||
(#x26 #x62 "he" #x62 "ll" #x60 #x60 #x61 "o" #x36))
|
||||
(cross-check "hello"
|
||||
(stream-of 'symbol (sequence->generator '(#"he" #"ll" #"" #"" #"o")))
|
||||
'hello
|
||||
(#x27 #x62 "he" #x62 "ll" #x60 #x60 #x61 "o" #x37))
|
||||
(cross-check "[\"hello\" there #\"world\" [] #set{} #true #false]"
|
||||
`("hello" there #"world" () ,(set) #t #f)
|
||||
(#xC7 #x55 "hello" #x75 "there" #x65 "world" #xC0 #xD0 #x01 #x00))
|
||||
|
||||
(cross-check "#\"ABC\"" #"ABC" (#x63 #x41 #x42 #x43))
|
||||
(cross-check "#hex{414243}" #"ABC" (#x63 #x41 #x42 #x43))
|
||||
(cross-check "#hex{ 41 4A 4e }" #"AJN" (#x63 #x41 #x4A #x4E))
|
||||
(cross-check "#hex{ 41;re\n 42 43 }" #"ABC" (#x63 #x41 #x42 #x43))
|
||||
(check-exn exn? (lambda () (string->preserve "#hex{414 243}"))) ;; bytes must be 2-digits entire
|
||||
(cross-check "#base64{Y29yeW1i}" #"corymb" (#x66 "corymb"))
|
||||
(cross-check "#base64{Y29 yeW 1i}" #"corymb" (#x66 "corymb"))
|
||||
(cross-check ";; a comment\n#base64{\n;x\nY29 yeW 1i}" #"corymb" (#x66 "corymb"))
|
||||
(cross-check "#base64{SGk=}" #"Hi" (#x62 "Hi"))
|
||||
(cross-check "#base64{SGk}" #"Hi" (#x62 "Hi"))
|
||||
(cross-check "#base64{ S G k }" #"Hi" (#x62 "Hi"))
|
||||
|
||||
(check-equal? (string->preserve "[]") '())
|
||||
(check-equal? (string->preserve "{}") (hash))
|
||||
(check-equal? (string->preserve "\"\"") "")
|
||||
(check-equal? (string->preserve "||") (string->symbol ""))
|
||||
(check-equal? (string->preserve "#set{}") (set))
|
||||
|
||||
(check-equal? (string->preserve "{1 2 3}") (set 1 2 3))
|
||||
(check-equal? (string->preserve "#set{1 2 3}") (set 1 2 3))
|
||||
|
||||
(cross-check "\"abc\\u6c34\\u6C34\\\\\\/\\\"\\b\\f\\n\\r\\txyz\""
|
||||
"abc\u6c34\u6c34\\/\"\b\f\n\r\txyz"
|
||||
(#x5f #x14
|
||||
#x61 #x62 #x63 #xe6 #xb0 #xb4 #xe6 #xb0
|
||||
#xb4 #x5c #x2f #x22 #x08 #x0c #x0a #x0d
|
||||
#x09 #x78 #x79 #x7a))
|
||||
|
||||
(cross-check "|abc\\u6c34\\u6C34\\\\\\/\\|\\b\\f\\n\\r\\txyz|"
|
||||
(string->symbol "abc\u6c34\u6c34\\/|\b\f\n\r\txyz")
|
||||
(#x7f #x14
|
||||
#x61 #x62 #x63 #xe6 #xb0 #xb4 #xe6 #xb0
|
||||
#xb4 #x5c #x2f #x7c #x08 #x0c #x0a #x0d
|
||||
#x09 #x78 #x79 #x7a))
|
||||
|
||||
(check-exn #px"Invalid escape code \\\\u" (lambda () (string->preserve "#\"\\u6c34\"")))
|
||||
|
||||
(cross-check "#\"abc\\x6c\\x34\\xf0\\\\\\/\\\"\\b\\f\\n\\r\\txyz\""
|
||||
#"abc\x6c\x34\xf0\\/\"\b\f\n\r\txyz"
|
||||
(#x6f #x11
|
||||
#x61 #x62 #x63 #x6c #x34 #xf0 #x5c #x2f
|
||||
#x22 #x08 #x0c #x0a #x0d #x09 #x78 #x79 #x7a))
|
||||
|
||||
(cross-check "\"\\uD834\\uDD1E\"" "\U0001D11E" (#x54 #xF0 #x9D #x84 #x9E))
|
||||
|
||||
(cross-check "-257" -257 (#x42 #xFE #xFF))
|
||||
(cross-check "-256" -256 (#x42 #xFF #x00))
|
||||
(cross-check "-255" -255 (#x42 #xFF #x01))
|
||||
(cross-check "-254" -254 (#x42 #xFF #x02))
|
||||
(cross-check "-129" -129 (#x42 #xFF #x7F))
|
||||
(cross-check "-128" -128 (#x41 #x80))
|
||||
(cross-check "-127" -127 (#x41 #x81))
|
||||
(cross-check "-4" -4 (#x41 #xFC))
|
||||
(cross-check "-3" -3 (#x1D))
|
||||
(cross-check "-2" -2 (#x1E))
|
||||
(cross-check "-1" -1 (#x1F))
|
||||
(cross-check "0" 0 (#x10))
|
||||
(cross-check "1" 1 (#x11))
|
||||
(cross-check "12" 12 (#x1C))
|
||||
(cross-check "13" 13 (#x41 #x0D))
|
||||
(cross-check "127" 127 (#x41 #x7F))
|
||||
(cross-check "128" 128 (#x42 #x00 #x80))
|
||||
(cross-check "255" 255 (#x42 #x00 #xFF))
|
||||
(cross-check "256" 256 (#x42 #x01 #x00))
|
||||
(cross-check "32767" 32767 (#x42 #x7F #xFF))
|
||||
(cross-check "32768" 32768 (#x43 #x00 #x80 #x00))
|
||||
(cross-check "65535" 65535 (#x43 #x00 #xFF #xFF))
|
||||
(cross-check "65536" 65536 (#x43 #x01 #x00 #x00))
|
||||
(cross-check "131072" 131072 (#x43 #x02 #x00 #x00))
|
||||
|
||||
(cross-check "1.0f" 1.0f0 (#b00000010 #b00111111 #b10000000 0 0))
|
||||
(cross-check "1.0" 1.0 (#b00000011 #b00111111 #b11110000 0 0 0 0 0 0))
|
||||
(cross-check "-1.202e300" -1.202e300 (#x03 #xFE #x3C #xB7 #xB7 #x59 #xBF #x04 #x26))
|
||||
|
||||
(check-equal? (d (expected #x25 #x51 "a" #x35)) (void)) ;; Bad chunk type: must be bytes
|
||||
(check-equal? (d (expected #x25 #x71 "a" #x35)) (void)) ;; Bad chunk type: must be bytes
|
||||
(check-equal? (d (expected #x26 #x51 "a" #x36)) (void)) ;; Bad chunk type: must be bytes
|
||||
(check-equal? (d (expected #x26 #x71 "a" #x36)) (void)) ;; Bad chunk type: must be bytes
|
||||
(check-equal? (d (expected #x27 #x51 "a" #x37)) (void)) ;; Bad chunk type: must be bytes
|
||||
(check-equal? (d (expected #x27 #x71 "a" #x37)) (void)) ;; Bad chunk type: must be bytes
|
||||
(check-equal? (d (expected #x25 #x61 "a" #x35)) "a")
|
||||
(check-equal? (d (expected #x26 #x61 "a" #x36)) #"a")
|
||||
(check-equal? (d (expected #x27 #x61 "a" #x37)) 'a)
|
||||
|
||||
(struct date (year month day) #:prefab)
|
||||
(struct thing (id) #:prefab)
|
||||
(struct person thing (name date-of-birth) #:prefab)
|
||||
(struct titled person (title) #:prefab)
|
||||
|
||||
(cross-check
|
||||
"[titled person 2 thing 1](101, \"Blackwell\", date(1821 2 3), \"Dr\")"
|
||||
(titled 101 "Blackwell" (date 1821 2 3) "Dr")
|
||||
(#xB5 ;; Record, generic, 4+1
|
||||
#xC5 ;; Sequence, 5
|
||||
#x76 #x74 #x69 #x74 #x6C #x65 #x64 ;; Symbol, "titled"
|
||||
#x76 #x70 #x65 #x72 #x73 #x6F #x6E ;; Symbol, "person"
|
||||
#x12 ;; SignedInteger, "2"
|
||||
#x75 #x74 #x68 #x69 #x6E #x67 ;; Symbol, "thing"
|
||||
#x11 ;; SignedInteger, "1"
|
||||
#x41 #x65 ;; SignedInteger, "101"
|
||||
#x59 #x42 #x6C #x61 #x63 #x6B #x77 #x65 #x6C #x6C ;; String, "Blackwell"
|
||||
#xB4 ;; Record, generic, 3+1
|
||||
#x74 #x64 #x61 #x74 #x65 ;; Symbol, "date"
|
||||
#x42 #x07 #x1D ;; SignedInteger, "1821"
|
||||
#x12 ;; SignedInteger, "2"
|
||||
#x13 ;; SignedInteger, "3"
|
||||
#x52 #x44 #x72 ;; String, "Dr"
|
||||
))
|
||||
|
||||
(cross-check "discard()" (record 'discard '()) (discard) (#x80))
|
||||
(cross-check "discard(surprise)"
|
||||
(record 'discard '(surprise))
|
||||
'#s(discard surprise)
|
||||
(#x81 #x78 "surprise"))
|
||||
(cross-check "capture(x)" (record 'capture '(x)) (capture 'x) (#x91 #x71 "x"))
|
||||
(cross-check "observe(x)" (record 'observe '(x)) (observe 'x) (#xA1 #x71 "x"))
|
||||
(cross-check "observe(x y)" (record 'observe '(x y)) '#s(observe x y) (#xA2 #x71 "x" #x71 "y"))
|
||||
(cross-check "other(x y)"
|
||||
(record 'other '(x y))
|
||||
'#s(other x y)
|
||||
(#xB3 #x75 "other" #x71 "x" #x71 "y"))
|
||||
(cross-check "\"aString\"(3 4)"
|
||||
(record "aString" '(3 4))
|
||||
(#xB3 #x57 "aString" #x13 #x14))
|
||||
(cross-check "discard()(3, 4)"
|
||||
(record (discard) '(3 4))
|
||||
(#xB3 #x80 #x13 #x14))
|
||||
|
||||
(check-equal? (d (expected #x2C #x00 #x00)) (void)) ;; missing end byte
|
||||
(check-equal? (d (expected #xC3 #x00 #x00)) (void)) ;; missing element
|
||||
|
||||
(cross-check/nondeterministic
|
||||
"{a: 1, \"b\": #true, [1 2 3]: #\"c\", {first-name:\"Elizabeth\"}:{surname:\"Blackwell\"}}"
|
||||
(hash 'a 1
|
||||
"b" #t
|
||||
'(1 2 3) #"c"
|
||||
(hash 'first-name "Elizabeth") (hash 'surname "Blackwell"))
|
||||
(#xE8 #x71 "a" #x11
|
||||
#x51 "b" #x01
|
||||
#xC3 #x11 #x12 #x13 #x61 "c"
|
||||
#xE2 #x7A "first-name" #x59 "Elizabeth"
|
||||
#xE2 #x77 "surname" #x59 "Blackwell"
|
||||
))
|
||||
|
||||
(let ()
|
||||
(local-require json)
|
||||
(define rfc8259-example1 (string->preserve #<<EOF
|
||||
{
|
||||
"Image": {
|
||||
"Width": 800,
|
||||
"Height": 600,
|
||||
"Title": "View from 15th Floor",
|
||||
"Thumbnail": {
|
||||
"Url": "http://www.example.com/image/481989943",
|
||||
"Height": 125,
|
||||
"Width": 100
|
||||
},
|
||||
"Animated" : false,
|
||||
"IDs": [116, 943, 234, 38793]
|
||||
}
|
||||
}
|
||||
EOF
|
||||
))
|
||||
(define rfc8259-example2 (string->preserve #<<EOF
|
||||
[
|
||||
{
|
||||
"precision": "zip",
|
||||
"Latitude": 37.7668,
|
||||
"Longitude": -122.3959,
|
||||
"Address": "",
|
||||
"City": "SAN FRANCISCO",
|
||||
"State": "CA",
|
||||
"Zip": "94107",
|
||||
"Country": "US"
|
||||
},
|
||||
{
|
||||
"precision": "zip",
|
||||
"Latitude": 37.371991,
|
||||
"Longitude": -122.026020,
|
||||
"Address": "",
|
||||
"City": "SUNNYVALE",
|
||||
"State": "CA",
|
||||
"Zip": "94085",
|
||||
"Country": "US"
|
||||
}
|
||||
]
|
||||
EOF
|
||||
))
|
||||
|
||||
(cross-check/nondeterministic
|
||||
"{\"Image\": {\"Width\": 800,\"Height\": 600,\"Title\": \"View from 15th Floor\",\"Thumbnail\": {\"Url\": \"http://www.example.com/image/481989943\",\"Height\": 125,\"Width\": 100},\"Animated\" : false,\"IDs\": [116, 943, 234, 38793]}}"
|
||||
rfc8259-example1
|
||||
(#xe2
|
||||
#x55 "Image"
|
||||
#xec
|
||||
#x55 "Width" #x42 #x03 #x20
|
||||
#x55 "Title" #x5f #x14 "View from 15th Floor"
|
||||
#x58 "Animated" #x75 "false"
|
||||
#x56 "Height" #x42 #x02 #x58
|
||||
#x59 "Thumbnail"
|
||||
#xe6
|
||||
#x55 "Width" #x41 #x64
|
||||
#x53 "Url" #x5f #x26 "http://www.example.com/image/481989943"
|
||||
#x56 "Height" #x41 #x7d
|
||||
#x53 "IDs" #xc4
|
||||
#x41 #x74
|
||||
#x42 #x03 #xaf
|
||||
#x42 #x00 #xea
|
||||
#x43 #x00 #x97 #x89
|
||||
))
|
||||
|
||||
(cross-check/nondeterministic
|
||||
"[{\"precision\": \"zip\",\"Latitude\": 37.7668,\"Longitude\": -122.3959,\"Address\": \"\",\"City\": \"SAN FRANCISCO\",\"State\": \"CA\",\"Zip\": \"94107\",\"Country\": \"US\"},{\"precision\": \"zip\",\"Latitude\": 37.371991,\"Longitude\": -122.026020,\"Address\": \"\",\"City\": \"SUNNYVALE\",\"State\": \"CA\",\"Zip\": \"94085\",\"Country\": \"US\"}]"
|
||||
rfc8259-example2
|
||||
(#xc2
|
||||
#xef #x10
|
||||
#x59 "precision" #x53 "zip"
|
||||
#x58 "Latitude" #x03 #x40 #x42 #xe2 #x26 #x80 #x9d #x49 #x52
|
||||
#x59 "Longitude" #x03 #xc0 #x5e #x99 #x56 #x6c #xf4 #x1f #x21
|
||||
#x57 "Address" #x50
|
||||
#x54 "City" #x5D "SAN FRANCISCO"
|
||||
#x55 "State" #x52 "CA"
|
||||
#x53 "Zip" #x55 "94107"
|
||||
#x57 "Country" #x52 "US"
|
||||
#xef #x10
|
||||
#x59 "precision" #x53 "zip"
|
||||
#x58 "Latitude" #x03 #x40 #x42 #xaf #x9d #x66 #xad #xb4 #x03
|
||||
#x59 "Longitude" #x03 #xc0 #x5e #x81 #xaa #x4f #xca #x42 #xaf
|
||||
#x57 "Address" #x50
|
||||
#x54 "City" #x59 "SUNNYVALE"
|
||||
#x55 "State" #x52 "CA"
|
||||
#x53 "Zip" #x55 "94085"
|
||||
#x57 "Country" #x52 "US"
|
||||
))
|
||||
)
|
||||
)
|
|
@ -1,171 +0,0 @@
|
|||
from preserve import *
|
||||
import unittest
|
||||
|
||||
if isinstance(chr(123), bytes):
|
||||
def _byte(x):
|
||||
return chr(x)
|
||||
def _hex(x):
|
||||
return x.encode('hex')
|
||||
else:
|
||||
def _byte(x):
|
||||
return bytes([x])
|
||||
def _hex(x):
|
||||
return x.hex()
|
||||
|
||||
def _buf(*args):
|
||||
result = []
|
||||
for chunk in args:
|
||||
if isinstance(chunk, bytes):
|
||||
result.append(chunk)
|
||||
elif isinstance(chunk, basestring):
|
||||
result.append(chunk.encode('utf-8'))
|
||||
elif isinstance(chunk, numbers.Number):
|
||||
result.append(_byte(chunk))
|
||||
else:
|
||||
raise Exception('Invalid chunk in _buf %r' % (chunk,))
|
||||
result = b''.join(result)
|
||||
return result
|
||||
|
||||
def _varint(v):
|
||||
e = Encoder()
|
||||
e.varint(v)
|
||||
return e.contents()
|
||||
|
||||
def _d(bs):
|
||||
d = Decoder(bs)
|
||||
return d.next()
|
||||
|
||||
def _e(v):
|
||||
e = Encoder()
|
||||
e.append(v)
|
||||
return e.contents()
|
||||
|
||||
def _R(k, *args):
|
||||
return Record(Symbol(k), args)
|
||||
|
||||
class CodecTests(unittest.TestCase):
|
||||
def _roundtrip(self, forward, expected, back=None, nondeterministic=False):
|
||||
if back is None: back = forward
|
||||
self.assertEqual(_d(_e(forward)), back)
|
||||
self.assertEqual(_d(_e(back)), back)
|
||||
self.assertEqual(_d(expected), back)
|
||||
if not nondeterministic:
|
||||
actual = _e(forward)
|
||||
self.assertEqual(actual, expected, '%s != %s' % (_hex(actual), _hex(expected)))
|
||||
|
||||
def test_decode_varint(self):
|
||||
with self.assertRaises(DecodeError):
|
||||
Decoder(_buf()).varint()
|
||||
self.assertEqual(Decoder(_buf(0)).varint(), 0)
|
||||
self.assertEqual(Decoder(_buf(10)).varint(), 10)
|
||||
self.assertEqual(Decoder(_buf(100)).varint(), 100)
|
||||
self.assertEqual(Decoder(_buf(200, 1)).varint(), 200)
|
||||
self.assertEqual(Decoder(_buf(0b10101100, 0b00000010)).varint(), 300)
|
||||
self.assertEqual(Decoder(_buf(128, 148, 235, 220, 3)).varint(), 1000000000)
|
||||
|
||||
def test_encode_varint(self):
|
||||
self.assertEqual(_varint(0), _buf(0))
|
||||
self.assertEqual(_varint(10), _buf(10))
|
||||
self.assertEqual(_varint(100), _buf(100))
|
||||
self.assertEqual(_varint(200), _buf(200, 1))
|
||||
self.assertEqual(_varint(300), _buf(0b10101100, 0b00000010))
|
||||
self.assertEqual(_varint(1000000000), _buf(128, 148, 235, 220, 3))
|
||||
|
||||
def test_shorts(self):
|
||||
self._roundtrip(_R('capture', _R('discard')), _buf(0x91, 0x80))
|
||||
self._roundtrip(_R('observe', _R('speak', _R('discard'), _R('capture', _R('discard')))),
|
||||
_buf(0xA1, 0xB3, 0x75, "speak", 0x80, 0x91, 0x80))
|
||||
|
||||
def test_simple_seq(self):
|
||||
self._roundtrip([1,2,3,4], _buf(0xC4, 0x11, 0x12, 0x13, 0x14), back=(1,2,3,4))
|
||||
self._roundtrip(SequenceStream([1,2,3,4]), _buf(0x2C, 0x11, 0x12, 0x13, 0x14, 0x3C),
|
||||
back=(1,2,3,4))
|
||||
self._roundtrip((-2,-1,0,1), _buf(0xC4, 0x1E, 0x1F, 0x10, 0x11))
|
||||
|
||||
def test_str(self):
|
||||
self._roundtrip(u'hello', _buf(0x55, 'hello'))
|
||||
self._roundtrip(StringStream([b'he', b'llo']), _buf(0x25, 0x62, 'he', 0x63, 'llo', 0x35),
|
||||
back=u'hello')
|
||||
self._roundtrip(StringStream([b'he', b'll', b'', b'', b'o']),
|
||||
_buf(0x25, 0x62, 'he', 0x62, 'll', 0x60, 0x60, 0x61, 'o', 0x35),
|
||||
back=u'hello')
|
||||
self._roundtrip(BinaryStream([b'he', b'll', b'', b'', b'o']),
|
||||
_buf(0x26, 0x62, 'he', 0x62, 'll', 0x60, 0x60, 0x61, 'o', 0x36),
|
||||
back=b'hello')
|
||||
self._roundtrip(SymbolStream([b'he', b'll', b'', b'', b'o']),
|
||||
_buf(0x27, 0x62, 'he', 0x62, 'll', 0x60, 0x60, 0x61, 'o', 0x37),
|
||||
back=Symbol(u'hello'))
|
||||
|
||||
def test_mixed1(self):
|
||||
self._roundtrip((u'hello', Symbol(u'there'), b'world', (), set(), True, False),
|
||||
_buf(0xc7, 0x55, 'hello', 0x75, 'there', 0x65, 'world', 0xc0, 0xd0, 1, 0))
|
||||
|
||||
def test_signedinteger(self):
|
||||
self._roundtrip(-257, _buf(0x42, 0xFE, 0xFF))
|
||||
self._roundtrip(-256, _buf(0x42, 0xFF, 0x00))
|
||||
self._roundtrip(-255, _buf(0x42, 0xFF, 0x01))
|
||||
self._roundtrip(-254, _buf(0x42, 0xFF, 0x02))
|
||||
self._roundtrip(-129, _buf(0x42, 0xFF, 0x7F))
|
||||
self._roundtrip(-128, _buf(0x41, 0x80))
|
||||
self._roundtrip(-127, _buf(0x41, 0x81))
|
||||
self._roundtrip(-4, _buf(0x41, 0xFC))
|
||||
self._roundtrip(-3, _buf(0x1D))
|
||||
self._roundtrip(-2, _buf(0x1E))
|
||||
self._roundtrip(-1, _buf(0x1F))
|
||||
self._roundtrip(0, _buf(0x10))
|
||||
self._roundtrip(1, _buf(0x11))
|
||||
self._roundtrip(12, _buf(0x1C))
|
||||
self._roundtrip(13, _buf(0x41, 0x0D))
|
||||
self._roundtrip(127, _buf(0x41, 0x7F))
|
||||
self._roundtrip(128, _buf(0x42, 0x00, 0x80))
|
||||
self._roundtrip(255, _buf(0x42, 0x00, 0xFF))
|
||||
self._roundtrip(256, _buf(0x42, 0x01, 0x00))
|
||||
self._roundtrip(32767, _buf(0x42, 0x7F, 0xFF))
|
||||
self._roundtrip(32768, _buf(0x43, 0x00, 0x80, 0x00))
|
||||
self._roundtrip(65535, _buf(0x43, 0x00, 0xFF, 0xFF))
|
||||
self._roundtrip(65536, _buf(0x43, 0x01, 0x00, 0x00))
|
||||
self._roundtrip(131072, _buf(0x43, 0x02, 0x00, 0x00))
|
||||
|
||||
def test_floats(self):
|
||||
self._roundtrip(Float(1.0), _buf(2, 0x3f, 0x80, 0, 0))
|
||||
self._roundtrip(1.0, _buf(3, 0x3f, 0xf0, 0, 0, 0, 0, 0, 0))
|
||||
self._roundtrip(-1.202e300, _buf(3, 0xfe, 0x3c, 0xb7, 0xb7, 0x59, 0xbf, 0x04, 0x26))
|
||||
|
||||
def test_badchunks(self):
|
||||
self.assertEqual(_d(_buf(0x25, 0x61, 'a', 0x35)), u'a')
|
||||
self.assertEqual(_d(_buf(0x26, 0x61, 'a', 0x36)), b'a')
|
||||
self.assertEqual(_d(_buf(0x27, 0x61, 'a', 0x37)), Symbol(u'a'))
|
||||
for a in [0x25, 0x26, 0x27]:
|
||||
for b in [0x51, 0x71]:
|
||||
with self.assertRaises(DecodeError, msg='Unexpected non-binary chunk') as cm:
|
||||
_d(_buf(a, b, 'a', 0x10+a))
|
||||
|
||||
def test_person(self):
|
||||
self._roundtrip(Record((Symbol(u'titled'), Symbol(u'person'), 2, Symbol(u'thing'), 1),
|
||||
[
|
||||
101,
|
||||
u'Blackwell',
|
||||
_R(u'date', 1821, 2, 3),
|
||||
u'Dr'
|
||||
]),
|
||||
_buf(0xB5, 0xC5, 0x76, 0x74, 0x69, 0x74, 0x6C, 0x65,
|
||||
0x64, 0x76, 0x70, 0x65, 0x72, 0x73, 0x6F, 0x6E,
|
||||
0x12, 0x75, 0x74, 0x68, 0x69, 0x6E, 0x67, 0x11,
|
||||
0x41, 0x65, 0x59, 0x42, 0x6C, 0x61, 0x63, 0x6B,
|
||||
0x77, 0x65, 0x6C, 0x6C, 0xB4, 0x74, 0x64, 0x61,
|
||||
0x74, 0x65, 0x42, 0x07, 0x1D, 0x12, 0x13, 0x52,
|
||||
0x44, 0x72))
|
||||
|
||||
def test_dict(self):
|
||||
self._roundtrip({ Symbol(u'a'): 1,
|
||||
u'b': True,
|
||||
(1, 2, 3): b'c',
|
||||
ImmutableDict({ Symbol(u'first-name'): u'Elizabeth', }):
|
||||
{ Symbol(u'surname'): u'Blackwell' } },
|
||||
_buf(0xE8,
|
||||
0x71, "a", 0x11,
|
||||
0x51, "b", 0x01,
|
||||
0xC3, 0x11, 0x12, 0x13, 0x61, "c",
|
||||
0xE2, 0x7A, "first-name", 0x59, "Elizabeth",
|
||||
0xE2, 0x77, "surname", 0x59, "Blackwell"),
|
||||
nondeterministic = True)
|
|
@ -8,7 +8,7 @@
|
|||
(require racket/random file/sha1)
|
||||
(require imperative-syndicate/skeleton)
|
||||
(require imperative-syndicate/term)
|
||||
(require "preserve.rkt")
|
||||
(require preserves)
|
||||
|
||||
(define-logger mcds)
|
||||
|
||||
|
|
Loading…
Reference in New Issue