Many improvements
This commit is contained in:
parent
2996970cbe
commit
f2f57385ce
|
@ -51,7 +51,6 @@ later in this document.
|
||||||
| Boolean
|
| Boolean
|
||||||
| Float
|
| Float
|
||||||
| Double
|
| Double
|
||||||
| MIMEData
|
|
||||||
|
|
||||||
Compound = Record
|
Compound = Record
|
||||||
| Sequence
|
| Sequence
|
||||||
|
@ -86,7 +85,7 @@ follows:[^ordering-by-syntax]
|
||||||
(Compounds) Record < Sequence < Set < Dictionary
|
(Compounds) Record < Sequence < Set < Dictionary
|
||||||
|
|
||||||
(Atoms) SignedInteger < String < ByteString < Symbol
|
(Atoms) SignedInteger < String < ByteString < Symbol
|
||||||
< Boolean < Float < Double < MIMEData
|
< Boolean < Float < Double
|
||||||
|
|
||||||
[^ordering-by-syntax]: The observant reader may note that the
|
[^ordering-by-syntax]: The observant reader may note that the
|
||||||
ordering here is the same as that implied by the tagging scheme
|
ordering here is the same as that implied by the tagging scheme
|
||||||
|
@ -183,21 +182,6 @@ and infinities, using a suffix `f` or `d` to indicate `Float` or
|
||||||
**Non-examples.** 10, -6, and 0, because writing them this way
|
**Non-examples.** 10, -6, and 0, because writing them this way
|
||||||
indicates `SignedInteger`s, not `Float`s or `Double`s.
|
indicates `SignedInteger`s, not `Float`s or `Double`s.
|
||||||
|
|
||||||
### MIME-type tagged binary data.
|
|
||||||
|
|
||||||
A `MIMEData` is a pair of a `Symbol` denoting a
|
|
||||||
[media type](https://tools.ietf.org/html/rfc6838) and a `ByteString`
|
|
||||||
body, intended to be interpreted as an encoding of a document having
|
|
||||||
that media type. While each media type may define its own rules for
|
|
||||||
comparing documents, we define ordering among `MIMEData`
|
|
||||||
*representations* of such media types lexicographically over the
|
|
||||||
(`Symbol`, `ByteString`) pair. We write examples using the same syntax
|
|
||||||
as for byte strings, but with the media type `Symbol` sandwiched
|
|
||||||
between the “`#`” and the first “`"`”.
|
|
||||||
|
|
||||||
**Examples.** `#application/octet-stream""`; `#text/plain"ABC"`;
|
|
||||||
`#application/xml"<xhtml/>"`; `#text/csv"123,234,345"`.
|
|
||||||
|
|
||||||
### Records.
|
### Records.
|
||||||
|
|
||||||
A `Record` is a *labelled* tuple of zero or more `Value`s, called the
|
A `Record` is a *labelled* tuple of zero or more `Value`s, called the
|
||||||
|
@ -255,12 +239,12 @@ containing only the empty set; `#set{4 "hello" (void) 9.0f}`, the set
|
||||||
containing 4, the string `"hello"`, the record with label `void` and
|
containing 4, the string `"hello"`, the record with label `void` and
|
||||||
no fields, and the `Float` denoting the number 9.0; `#set{1 1.0f}`,
|
no fields, and the `Float` denoting the number 9.0; `#set{1 1.0f}`,
|
||||||
the set containing a `SignedInteger` and a `Float`, both denoting the
|
the set containing a `SignedInteger` and a `Float`, both denoting the
|
||||||
number 1; `#set{#application/xml"<x/>" #application/xml"<x />"}`, a
|
number 1; `#set{(mime application/xml #"<x/>") (mime
|
||||||
set containing two different `MIMEData`
|
application/xml #"<x />")}`, a set containing two different
|
||||||
values.[^mimedata-xml-difference]
|
type-labelled byte arrays.[^mime-xml-difference]
|
||||||
|
|
||||||
[^mimedata-xml-difference]: The two XML documents `<x/>` and `<x />`
|
[^mime-xml-difference]: The two XML documents `<x/>` and `<x />`
|
||||||
differ by bytewise comparison, and thus yield different `MIMEData`
|
differ by bytewise comparison, and thus yield different record
|
||||||
values, even though under the semantics of XML they denote
|
values, even though under the semantics of XML they denote
|
||||||
identical XML infoset.
|
identical XML infoset.
|
||||||
|
|
||||||
|
@ -343,8 +327,6 @@ The following figure summarises the definitions below:
|
||||||
11 00 0010 Float, 32 bits big-endian binary
|
11 00 0010 Float, 32 bits big-endian binary
|
||||||
11 00 0011 Double, 64 bits big-endian binary
|
11 00 0011 Double, 64 bits big-endian binary
|
||||||
|
|
||||||
11 01 mmmm ... MIME-type-labelled binary data
|
|
||||||
|
|
||||||
If mmmm = 1111, varint(m) is present; otherwise, m is the length
|
If mmmm = 1111, varint(m) is present; otherwise, m is the length
|
||||||
|
|
||||||
#### Type and Length representation
|
#### Type and Length representation
|
||||||
|
@ -367,7 +349,6 @@ follows:[^some-encodings-unused]
|
||||||
leadbyte(1,-,-) represents a Sequence, Set or Dictionary
|
leadbyte(1,-,-) represents a Sequence, Set or Dictionary
|
||||||
leadbyte(2,-,-) represents an Atom with variable-length binary representation
|
leadbyte(2,-,-) represents an Atom with variable-length binary representation
|
||||||
leadbyte(3,0,-) represents an Atom with fixed-length binary representation
|
leadbyte(3,0,-) represents an Atom with fixed-length binary representation
|
||||||
leadbyte(3,1,-) represents certain special variable-length values
|
|
||||||
|
|
||||||
[^some-encodings-unused]: Some encodings are unused. All such
|
[^some-encodings-unused]: Some encodings are unused. All such
|
||||||
encodings are reserved for future versions of this specification.
|
encodings are reserved for future versions of this specification.
|
||||||
|
@ -430,13 +411,26 @@ making
|
||||||
|
|
||||||
[[ [X_1 ... X_m] ]] = header(1,0,m) ++ [[X_1]] ++ ... ++ [[X_m]]
|
[[ [X_1 ... X_m] ]] = header(1,0,m) ++ [[X_1]] ++ ... ++ [[X_m]]
|
||||||
|
|
||||||
[[ #set{X_1 ... X_m} ]] = header(1,1,m) ++ [[Y_1]] ++ ... ++ [[Y_m]]
|
[[ #set{X_1 ... X_m} ]] = header(1,1,m) ++ [[X_1]] ++ ... ++ [[X_m]]
|
||||||
where [Y_1 ... Y_m] = sort([X_1 ... X_m])
|
|
||||||
|
|
||||||
[[ #dict{K_1:V_1 ... K_m:V_m} ]]
|
[[ #dict{K_1:V_1 ... K_m:V_m} ]]
|
||||||
= header(1,2,m) ++ [[K'_1]] ++ [[V'_1]] ++ ... ++ [[K'_m]] ++ [[V'_m]]
|
= header(1,2,m) ++ [[K_1]] ++ [[V_1]] ++ ... ++ [[K_m]] ++ [[V_m]]
|
||||||
where [[K'_1 V'_1] ... [K'_m V'_m]]
|
|
||||||
= sort([[K_1 V_1] ... [K_m V_m]])
|
There is *no* ordering requirement on the `X_i` elements or
|
||||||
|
`K_i`/`V_i` pairs.[^no-sorting-rationale] They may appear in any
|
||||||
|
order.
|
||||||
|
|
||||||
|
[^no-sorting-rationale]: In the BitTorrent encoding format,
|
||||||
|
[bencoding](http://www.bittorrent.org/beps/bep_0003.html#bencoding),
|
||||||
|
dictionary key/value pairs must be sorted by key. This is a
|
||||||
|
necessary step for ensuring serialization of `Value`s is
|
||||||
|
canonical. We do not require that key/value pairs (or set
|
||||||
|
elements) be in sorted order for serialized `Value`s, because (a)
|
||||||
|
where canonicalization is used for cryptographic signatures, it is
|
||||||
|
more reliable to simply retain the exact binary form of the signed
|
||||||
|
document than to depend on canonical de- and re-serialization, and
|
||||||
|
(b) sorting keys or elements makes no sense in streaming
|
||||||
|
serialization formats.
|
||||||
|
|
||||||
Note that `n=3` is unused and reserved.
|
Note that `n=3` is unused and reserved.
|
||||||
|
|
||||||
|
@ -451,6 +445,11 @@ Note that `n=3` is unused and reserved.
|
||||||
many whole bytes as needed to unambiguously
|
many whole bytes as needed to unambiguously
|
||||||
identify the value
|
identify the value
|
||||||
|
|
||||||
|
The value 0 needs zero bytes to identify the value, so `intbytes(0)`
|
||||||
|
is the empty byte string. Non-zero values need at least one byte; the
|
||||||
|
most-significant bit in the first byte in `intbytes(x)` for `x≠0` is
|
||||||
|
the sign bit.
|
||||||
|
|
||||||
For example,
|
For example,
|
||||||
|
|
||||||
[[ -257 ]] = [0x82, 0xFE, 0xFF]
|
[[ -257 ]] = [0x82, 0xFE, 0xFF]
|
||||||
|
@ -505,18 +504,6 @@ For example,
|
||||||
where binary32(F) and binary64(D) are big-endian 4- and 8-byte
|
where binary32(F) and binary64(D) are big-endian 4- and 8-byte
|
||||||
IEEE 754 binary representations
|
IEEE 754 binary representations
|
||||||
|
|
||||||
#### Special variable-length values
|
|
||||||
|
|
||||||
##### MIMEData
|
|
||||||
|
|
||||||
Each `MIMEData` value is comprised of a media type `Symbol` and a raw
|
|
||||||
binary body.
|
|
||||||
|
|
||||||
[[ M ]] when M ∈ MIMEData = header(3,1,m) ++ [[T]] ++ B
|
|
||||||
where m = |B|
|
|
||||||
and T is the Symbol media type of M
|
|
||||||
and B is the ByteString body of M
|
|
||||||
|
|
||||||
## Examples
|
## Examples
|
||||||
|
|
||||||
<!-- TODO: Give some examples of large and small Preserves, perhaps -->
|
<!-- TODO: Give some examples of large and small Preserves, perhaps -->
|
||||||
|
@ -554,16 +541,16 @@ encodes to
|
||||||
|
|
||||||
35 ;; Record, generic, 4+1
|
35 ;; Record, generic, 4+1
|
||||||
45 ;; Sequence, 5
|
45 ;; Sequence, 5
|
||||||
b6 74 69 74 6c 65 64 ;; Symbol, "titled"
|
B6 74 69 74 6C 65 64 ;; Symbol, "titled"
|
||||||
b6 70 65 72 73 6f 6e ;; Symbol, "person"
|
B6 70 65 72 73 6F 6E ;; Symbol, "person"
|
||||||
81 02 ;; SignedInteger, "2"
|
81 02 ;; SignedInteger, "2"
|
||||||
b5 74 68 69 6e 67 ;; Symbol, "thing"
|
B5 74 68 69 6E 67 ;; Symbol, "thing"
|
||||||
81 01 ;; SignedInteger, "1"
|
81 01 ;; SignedInteger, "1"
|
||||||
81 65 ;; SignedInteger, "101"
|
81 65 ;; SignedInteger, "101"
|
||||||
99 42 6c 61 63 6b 77 65 6c 6c ;; String, "Blackwell"
|
99 42 6C 61 63 6B 77 65 6C 6C ;; String, "Blackwell"
|
||||||
34 ;; Record, generic, 3+1
|
34 ;; Record, generic, 3+1
|
||||||
b4 64 61 74 65 ;; Symbol, "date"
|
B4 64 61 74 65 ;; Symbol, "date"
|
||||||
82 07 1d ;; SignedInteger, "1821"
|
82 07 1D ;; SignedInteger, "1821"
|
||||||
81 02 ;; SignedInteger, "2"
|
81 02 ;; SignedInteger, "2"
|
||||||
81 03 ;; SignedInteger, "3"
|
81 03 ;; SignedInteger, "3"
|
||||||
92 44 72 ;; String, "Dr"
|
92 44 72 ;; String, "Dr"
|
||||||
|
@ -605,6 +592,33 @@ treat them specially.
|
||||||
and one which enforces validity (i.e. side-conditions) when reading,
|
and one which enforces validity (i.e. side-conditions) when reading,
|
||||||
writing, or constructing `Value`s.
|
writing, or constructing `Value`s.
|
||||||
|
|
||||||
|
### MIME-type tagged binary data
|
||||||
|
|
||||||
|
Many internet protocols use
|
||||||
|
[media types](https://tools.ietf.org/html/rfc6838) (a.k.a MIME types)
|
||||||
|
to indicate the format of some associated binary data. For this
|
||||||
|
purpose, we define `MIMEData` to be a record labelled `mime` with two
|
||||||
|
fields, the first being a `Symbol`, the media type, and the second
|
||||||
|
being a `ByteString`, the binary data.
|
||||||
|
|
||||||
|
While each media type may define its own rules for comparing
|
||||||
|
documents, we define ordering among `MIMEData` *representations* of
|
||||||
|
such media types lexicographically over the (`Symbol`, `ByteString`)
|
||||||
|
pair.
|
||||||
|
|
||||||
|
**Examples.**
|
||||||
|
|
||||||
|
| `(mime application/octet-stream #"abcde")` | 33 B4 6D 69 6D 65 BF 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D A5 61 62 63 64 65 |
|
||||||
|
| `(mime text/plain "ABC")` | 33 B4 6D 69 6D 65 BA 74 65 78 74 2F 70 6C 61 69 6E 93 41 42 43 |
|
||||||
|
| `(mime application/xml "<xhtml/>")` | 33 B4 6D 69 6D 65 BF 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 98 3C 78 68 74 6D 6C 2F 3E |
|
||||||
|
| `(mime text/csv "123,234,345")` | 33 B4 6D 69 6D 65 B8 74 65 78 74 2F 63 73 76 9B 31 32 33 2C 32 33 34 2C 33 34 35 |
|
||||||
|
|
||||||
|
Applications making heavy use of `mime` records may choose to use a
|
||||||
|
short form label number for the record type. For example, if short
|
||||||
|
form label number 1 were chosen, the second example above, `(mime
|
||||||
|
text/plain "ABC")`, would be encoded with "12" in place of "33 B4 6D
|
||||||
|
69 6D 65".
|
||||||
|
|
||||||
### Text
|
### Text
|
||||||
|
|
||||||
#### Normalization forms
|
#### Normalization forms
|
||||||
|
@ -681,7 +695,6 @@ should both be identities.
|
||||||
- `Symbol` ↔ `Symbol.for(...)`
|
- `Symbol` ↔ `Symbol.for(...)`
|
||||||
- `Boolean` ↔ `Boolean`
|
- `Boolean` ↔ `Boolean`
|
||||||
- `Float` and `Double` ↔ numbers,
|
- `Float` and `Double` ↔ numbers,
|
||||||
- `MIMEData` ↔ `{ "type": aString, "data": aUint8Array }`
|
|
||||||
- `Record` ↔ `{ "_label": theLabel, "_fields": [field0, ..., fieldN] }`, plus convenience accessors
|
- `Record` ↔ `{ "_label": theLabel, "_fields": [field0, ..., fieldN] }`, plus convenience accessors
|
||||||
- `(undefined)` ↔ the undefined value
|
- `(undefined)` ↔ the undefined value
|
||||||
- `(rfc3339 F)` ↔ `Date`, if `F` matches the `date-time` RFC 3339 production
|
- `(rfc3339 F)` ↔ `Date`, if `F` matches the `date-time` RFC 3339 production
|
||||||
|
@ -697,7 +710,6 @@ should both be identities.
|
||||||
- `Symbol` ↔ symbols
|
- `Symbol` ↔ symbols
|
||||||
- `Boolean` ↔ booleans
|
- `Boolean` ↔ booleans
|
||||||
- `Float` and `Double` ↔ inexact numbers (Racket: single- and double-precision floats)
|
- `Float` and `Double` ↔ inexact numbers (Racket: single- and double-precision floats)
|
||||||
- `MIMEData` ↔ a structure with a `type` and a `data` field (Racket: `(struct mime (type data))`)
|
|
||||||
- `Record` ↔ structures (Racket: prefab struct)
|
- `Record` ↔ structures (Racket: prefab struct)
|
||||||
- `Sequence` ↔ lists
|
- `Sequence` ↔ lists
|
||||||
- `Set` ↔ Racket: sets
|
- `Set` ↔ Racket: sets
|
||||||
|
@ -711,7 +723,6 @@ should both be identities.
|
||||||
- `Symbol` ↔ a simple data class wrapping a `String`
|
- `Symbol` ↔ a simple data class wrapping a `String`
|
||||||
- `Boolean` ↔ `Boolean`
|
- `Boolean` ↔ `Boolean`
|
||||||
- `Float` and `Double` ↔ `Float` and `Double`
|
- `Float` and `Double` ↔ `Float` and `Double`
|
||||||
- `MIMEData` ↔ an implementation of `javax.activation.DataSource`, maybe?
|
|
||||||
- `Record` ↔ in a simple implementation, a generic `Record` class; else perhaps a bean mapping?
|
- `Record` ↔ in a simple implementation, a generic `Record` class; else perhaps a bean mapping?
|
||||||
- `Sequence` ↔ an implementation of `java.util.List`
|
- `Sequence` ↔ an implementation of `java.util.List`
|
||||||
- `Set` ↔ an implementation of `java.util.Set`
|
- `Set` ↔ an implementation of `java.util.Set`
|
||||||
|
@ -728,7 +739,6 @@ should both be identities.
|
||||||
binary of the utf-8
|
binary of the utf-8
|
||||||
- `Boolean` ↔ `true` and `false`
|
- `Boolean` ↔ `true` and `false`
|
||||||
- `Float` and `Double` ↔ floats (unsure how Erlang deals with single-precision)
|
- `Float` and `Double` ↔ floats (unsure how Erlang deals with single-precision)
|
||||||
- `MIMEData` ↔ tuple of the type as a utf8 binary, and the data as a binary
|
|
||||||
- `Record` ↔ a tuple with the label in the first position, and the fields in subsequent positions
|
- `Record` ↔ a tuple with the label in the first position, and the fields in subsequent positions
|
||||||
- `Sequence` ↔ a list
|
- `Sequence` ↔ a list
|
||||||
- `Set` ↔ a `sets` set (is this unambiguous? Maybe a [map][erlang-map] from elements to `true`?)
|
- `Set` ↔ a `sets` set (is this unambiguous? Maybe a [map][erlang-map] from elements to `true`?)
|
||||||
|
@ -753,7 +763,7 @@ should both be identities.
|
||||||
C2 - Float
|
C2 - Float
|
||||||
C3 - Double
|
C3 - Double
|
||||||
(Cx) RESERVED C4-CF
|
(Cx) RESERVED C4-CF
|
||||||
Dx - MIMEData
|
(Dx) RESERVED
|
||||||
(Ex) RESERVED
|
(Ex) RESERVED
|
||||||
(Fx) RESERVED
|
(Fx) RESERVED
|
||||||
|
|
||||||
|
@ -960,20 +970,13 @@ Q. Are the language mappings reasonable? How about one for Python?
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
OK so. No built-in `MIMEData`, but maybe a conventional `(mime-data
|
|
||||||
Symbol Bytes)`? Applications can put it in a short slot if they like.
|
|
||||||
|
|
||||||
Streaming: needed for variable-sized structures. Tricky to design
|
Streaming: needed for variable-sized structures. Tricky to design
|
||||||
syntax for this that isn't gratuitously warty. End byte value.
|
syntax for this that isn't gratuitously warty. End byte value.
|
||||||
|
|
||||||
|
SIGH. Streaming for text/bytes too I SUPPOSE. Chunks, like CBOR
|
||||||
|
|
||||||
Literal small integers: could be nice? Not absolutely necessary.
|
Literal small integers: could be nice? Not absolutely necessary.
|
||||||
|
|
||||||
Give algorithm for computing size of integers.
|
|
||||||
|
|
||||||
Give up on sorting requirement for representation of sets and
|
|
||||||
dictionaries?? Probably a good idea if there are streaming forms of
|
|
||||||
them because that sounds impossible to do??
|
|
||||||
|
|
||||||
Maybe reorder: fixed-length atoms first, then variable-length atoms,
|
Maybe reorder: fixed-length atoms first, then variable-length atoms,
|
||||||
then fixed-length compounds, then variable-length compounds? Reason
|
then fixed-length compounds, then variable-length compounds? Reason
|
||||||
being that then maybe can put the streaming forms of the
|
being that then maybe can put the streaming forms of the
|
||||||
|
|
Loading…
Reference in New Issue