This commit is contained in:
Tony Garnock-Jones 2018-09-23 22:35:00 +01:00
parent f2f57385ce
commit 00a69ae012
1 changed files with 265 additions and 151 deletions

View File

@ -298,73 +298,122 @@ connections to other data languages can also be made.
For now, we limit our attention to an easily-parsed, easily-produced For now, we limit our attention to an easily-parsed, easily-produced
machine-readable syntax. machine-readable syntax.
Every `Value` is represented as one or more bytes describing first its A `Repr` is an encoding, or representation, of a specific `Value`.
kind and its length, and then its specific contents. Each `Repr` comprises one or more bytes describing first the kind of
represented `Value` and the length of the representation, and then the
encoded details of the `Value` itself.
For a value `v`, we write `[[v]]` for the encoding of v. For a value `v`, we write `[[v]]` for the `Repr` of v.
The following figure summarises the definitions below: The following figure summarises the definitions below:
tt nn mmmm varint(m) contents tt nn mmmm varint(m) contents
------------------------------- -------------------------------
00 00 mmmm ... application-specific Record 00 00 0000 False
00 01 mmmm ... application-specific Record 00 00 0001 True
00 10 mmmm ... application-specific Record 00 00 0010 Float, 32 bits big-endian binary
00 11 mmmm ... Record 00 00 0011 Double, 64 bits big-endian binary
00 00 x1xx RESERVED
00 00 1xxx RESERVED
00 01 xxxx RESERVED
00 10 ttnn Start Stream <tt,nn>
When tt = 00 --> error
01 --> each chunk is a <tt,nn> piece
1x --> each chunk is a single encoded Value
00 11 ttnn End Stream <tt,nn> (must match preceding Start Stream)
01 00 mmmm ... Sequence 01 00 mmmm ... SignedInteger, big-endian binary
01 01 mmmm ... Set 01 01 mmmm ... String, UTF-8 binary
01 10 mmmm ... Dictionary 01 10 mmmm ... Bytes
01 11 mmmm ... Symbol, UTF-8 binary
10 00 mmmm ... SignedInteger, big-endian binary 10 00 mmmm ... application-specific Record
10 01 mmmm ... String, UTF-8 binary 10 01 mmmm ... application-specific Record
10 10 mmmm ... Bytes 10 10 mmmm ... application-specific Record
10 11 mmmm ... Symbol, UTF-8 binary 10 11 mmmm ... Record
11 00 0000 False 11 00 mmmm ... Sequence
11 00 0001 True 11 01 mmmm ... Set
11 00 0010 Float, 32 bits big-endian binary 11 10 mmmm ... Dictionary
11 00 0011 Double, 64 bits big-endian binary 11 11 xxxx RESERVED
If mmmm = 1111, varint(m) is present; otherwise, m is the length If mmmm = 1111, varint(m) is present; otherwise, m is the length
#### Type and Length representation #### Type and Length representation
A `Value`'s type and length is represented by use of a function Each `Repr` takes one of three possible forms:
`header(t,n,m)` that yields a sequence of bytes when `t`, `n` and `m`
are appropriate non-negative integers.
header(t,n,m) = leadbyte(t,n,m) when m < 15 - (A) a fixed-length form, used for simple values such as `Boolean`s
or leadbyte(t,n,15) ++ varint(m) otherwise or `Float`s.
The lead byte in a `Value`'s representation is constructed by a function - (B) a variable-length form with length specified up-front, used for
almost all `Record`s as well as for most `Sequence`s and `String`s,
when their sizes are known at the time serialization begins.
- (C) a variable-length streaming form with unknown or unpredictable
length, used only seldom for `Record`s, since the number of fields
in a `Record` is usually statically known, but sometimes used for
`Sequence`s, `String`s etc., such as in cases when serialization
begins before the number of elements or bytes in the corresponding
`Value` is known.
Applications may choose between formats (B) and (C) depending on their
needs at serialization time.
Every `Repr`, however, starts with a *lead byte* describing the
remainder of the representation.
##### The lead byte
The lead byte is constructed by a function `leadbyte`:
leadbyte(t,n,m) = [t*64 + n*16 + m] leadbyte(t,n,m) = [t*64 + n*16 + m]
Both `t` and `n` are two-bit unsigned numbers; `m` is a four-bit
unsigned number.
The lead byte describes the rest of the representation as The lead byte describes the rest of the representation as
follows:[^some-encodings-unused] follows:[^some-encodings-unused]
leadbyte(0,-,-) represents a Record
leadbyte(1,-,-) represents a Sequence, Set or Dictionary
leadbyte(2,-,-) represents an Atom with variable-length binary representation
leadbyte(3,0,-) represents an Atom with fixed-length binary representation
[^some-encodings-unused]: Some encodings are unused. All such [^some-encodings-unused]: Some encodings are unused. All such
encodings are reserved for future versions of this specification. encodings are reserved for future versions of this specification.
Variable-length representations use the value of `m` to encode their - `leadbyte(0,0,-)` (format A) represents an Atom with fixed-length binary representation.
lengths: - `leadbyte(0,1,-)` (format A) is RESERVED.
- `leadbyte(0,2,-)` (format C) is a Stream Start byte.
- `leadbyte(0,3,-)` (format C) is a Stream End byte.
- `leadbyte(1,-,-)` (format B) represents an Atom with variable-length binary representation.
- `leadbyte(2,-,-)` (format B) represents a Record.
- `leadbyte(3,-,-)` (format B) represents a Sequence, Set or Dictionary.
- Lengths between 0 and 14 are represented using `leadbyte` with `m` ##### Encoding data of fixed length (format A)
values 0 through 14.
- Lengths of 15 or greater are represented by `m` value 15, and
additional "length bytes" describing the length then follow the
lead byte.
These additional length bytes are formatted as Each specific type of data defines its own rules for this format.
[base 128 varints][varint]. Quoting the
[Google Protocol Buffers][varint] definition, ##### Encoding data of known length (format B)
A `Repr` where the length of the `Value` to be encoded is variable but
known uses the value of `m` in `leadbyte` to encode its length. The
length counts *bytes* for atomic `Value`s, but counts *contained
values* for compound `Value`s.
- A length `l` between 0 and 14 is represented using `leadbyte` with
`m=l`.
- A length of 15 or greater is represented by `m=15` and additional
bytes describing the length following the lead byte.
The function `header(t,n,m)` yields an appropriate sequence of bytes
describing a `Repr`'s type and length when `t`, `n` and `m` are
appropriate non-negative integers:
header(t,n,m) = leadbyte(t,n,m) when m < 15
or leadbyte(t,n,15) ++ varint(m) otherwise
The additional length bytes are formatted as
[base 128 varints][varint]. We write `varint(m)` for the
varint-encoding of `m`. Quoting the [Google Protocol Buffers][varint]
definition,
> Each byte in a varint, except the last byte, has the most > Each byte in a varint, except the last byte, has the most
> significant bit (msb) set this indicates that there are further > significant bit (msb) set this indicates that there are further
@ -378,43 +427,93 @@ These additional length bytes are formatted as
- 300 (binary, grouped into 7-bit chunks, `10 0101100`) varint-encodes to the two bytes 172 and 2. - 300 (binary, grouped into 7-bit chunks, `10 0101100`) varint-encodes to the two bytes 172 and 2.
- 1000000000 (binary `11 1011100 1101011 0010100 0000000`) varint-encodes to bytes 128, 148, 235, 220, and 3. - 1000000000 (binary `11 1011100 1101011 0010100 0000000`) varint-encodes to bytes 128, 148, 235, 220, and 3.
We write `varint(m)` for the varint-encoding of `m`. ##### Streaming data of unknown length (format C)
A `Repr` where the length of the `Value` to be encoded is variable and
not known at the time serialization of the `Value` starts is encoded
by a single Stream Start byte, followed by zero or more *chunks*,
followed by a matching Stream End byte:
startbyte(t,n) = leadbyte(0,2, t*4 + n)
endbyte(t,n) = leadbyte(0,3, t*4 + n)
For a `Repr` of a `Value` containing binary data, each chunk is to be
a format B `Repr` of the same type as the overall `Repr`.
For a `Repr` of a `Value` containing other `Value`s, each chunk is to
be a single `Repr`.
#### Records #### Records
[[ (L F_1 ... F_m) ]] = header(0,3,m+1) ++ [[L]] ++ [[F_1]] ++ ... ++ [[F_m]] Format B (known length):
[[ (L F_1 ... F_m) ]] = header(2,3,m+1) ++ [[L]] ++ [[F_1]] ++ ... ++ [[F_m]]
For `m` fields, `m+1` is supplied to `header`, to account for the For `m` fields, `m+1` is supplied to `header`, to account for the
encoding of the record label. encoding of the record label.
Format C (streaming):
[[ (L F_1 ... F_m) ]]
= startbyte(2,3) ++ [[L]] ++ [[F_1]] ++ ... ++ [[F_m]] ++ endbyte(2,3)
Applications *SHOULD* prefer the known-length format for encoding
`Record`s.
##### Application-specific short form for labels ##### Application-specific short form for labels
Any given protocol using Preserves may additionally define an Any given protocol using Preserves may additionally define an
interpretation for `n ∈ {0,1,2}`, mapping each *short form label interpretation for `n ∈ {0,1,2}`, mapping each *short form label
number* `n` to a specific record label. When encoding `m` fields with number* `n` to a specific record label. When encoding `m` fields with
short form label number `n`, the header is `header(0,n,m)` (rather short form label number `n`, format B becomes
than `m+1`) since the label is implicit.
header(2,n,m) ++ [[F_1]] ++ ... ++ [[F_m]]
and format C becomes
startbyte(2,n) ++ [[F_1]] ++ ... ++ [[F_m]] ++ endbyte(2,n)
**Examples.** For example, a protocol may choose to map records **Examples.** For example, a protocol may choose to map records
labelled `void` to `n=0`, making labelled `void` to `n=0`, making
[[(void)]] = header(0,0,0) = [0x00] [[(void)]] = header(2,0,0) = [0x80]
or it may map records labelled `person` to short form label number 1, or it may map records labelled `person` to short form label number 1,
making making
[[(person "Dr" "Elizabeth" "Blackwell")]] [[(person "Dr" "Elizabeth" "Blackwell")]]
= header(0,1,3) ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]` = header(2,1,3) ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
= [0x13] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]` = [0x93] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
for format B, or
= startbyte(2,1) ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]] ++ endbyte(2,1)
= [0x29] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]] ++ [0x39]
for format C.
#### Sequences, Sets and Dictionaries #### Sequences, Sets and Dictionaries
[[ [X_1 ... X_m] ]] = header(1,0,m) ++ [[X_1]] ++ ... ++ [[X_m]] Format B (known length):
[[ #set{X_1 ... X_m} ]] = header(1,1,m) ++ [[X_1]] ++ ... ++ [[X_m]] [[ [X_1 ... X_m] ]] = header(3,0,m) ++ [[X_1]] ++ ... ++ [[X_m]]
[[ #set{X_1 ... X_m} ]] = header(3,1,m) ++ [[X_1]] ++ ... ++ [[X_m]]
[[ #dict{K_1:V_1 ... K_m:V_m} ]] [[ #dict{K_1:V_1 ... K_m:V_m} ]]
= header(1,2,m) ++ [[K_1]] ++ [[V_1]] ++ ... ++ [[K_m]] ++ [[V_m]] = header(3,2,m) ++ [[K_1]] ++ [[V_1]] ++ ... ++ [[K_m]] ++ [[V_m]]
Format C (streaming):
[[ [X_1 ... X_m] ]] = startbyte(3,0) ++ [[X_1]] ++ ... ++ [[X_m]] ++ endbyte(3,0)
[[ #set{X_1 ... X_m} ]] = startbyte(3,1) ++ [[X_1]] ++ ... ++ [[X_m]] ++ endbyte(3,1)
[[ #dict{K_1:V_1 ... K_m:V_m} ]]
= startbyte(3,2) ++ [[K_1]] ++ [[V_1]] ++ ... ++ [[K_m]] ++ [[V_m]] ++ endbyte(3,2)
Applications may use whichever format suits their needs on a
case-by-case basis.
There is *no* ordering requirement on the `X_i` elements or There is *no* ordering requirement on the `X_i` elements or
`K_i`/`V_i` pairs.[^no-sorting-rationale] They may appear in any `K_i`/`V_i` pairs.[^no-sorting-rationale] They may appear in any
@ -432,19 +531,23 @@ order.
(b) sorting keys or elements makes no sense in streaming (b) sorting keys or elements makes no sense in streaming
serialization formats. serialization formats.
Note that `n=3` is unused and reserved. Note that `header(3,3,m)` and `startbyte(3,3)`/`endbyte(3,3)` is unused and reserved.
#### Variable-length Atoms #### Variable-length Atoms
##### SignedInteger ##### SignedInteger
[[ x ]] when x ∈ SignedInteger = header(2,0,m) ++ intbytes(x) Format B (known length):
[[ x ]] when x ∈ SignedInteger = header(1,0,m) ++ intbytes(x)
where m = |intbytes(x)| where m = |intbytes(x)|
and intbytes(x) = a big-endian two's-complement representation and intbytes(x) = a big-endian two's-complement representation
of the signed integer x, taking exactly as of the signed integer x, taking exactly as
many whole bytes as needed to unambiguously many whole bytes as needed to unambiguously
identify the value identify the value
Format C *MUST NOT* be used for `SignedInteger`s.
The value 0 needs zero bytes to identify the value, so `intbytes(0)` The value 0 needs zero bytes to identify the value, so `intbytes(0)`
is the empty byte string. Non-zero values need at least one byte; the is the empty byte string. Non-zero values need at least one byte; the
most-significant bit in the first byte in `intbytes(x)` for `x≠0` is most-significant bit in the first byte in `intbytes(x)` for `x≠0` is
@ -452,55 +555,78 @@ the sign bit.
For example, For example,
[[ -257 ]] = [0x82, 0xFE, 0xFF] [[ -257 ]] = [0x42, 0xFE, 0xFF]
[[ -256 ]] = [0x82, 0xFF, 0x00] [[ -256 ]] = [0x42, 0xFF, 0x00]
[[ -255 ]] = [0x82, 0xFF, 0x01] [[ -255 ]] = [0x42, 0xFF, 0x01]
[[ -254 ]] = [0x82, 0xFF, 0x02] [[ -254 ]] = [0x42, 0xFF, 0x02]
[[ -129 ]] = [0x82, 0xFF, 0x7F] [[ -129 ]] = [0x42, 0xFF, 0x7F]
[[ -128 ]] = [0x81, 0x80] [[ -128 ]] = [0x41, 0x80]
[[ -127 ]] = [0x81, 0x81] [[ -127 ]] = [0x41, 0x81]
[[ -2 ]] = [0x81, 0xFE] [[ -2 ]] = [0x41, 0xFE]
[[ -1 ]] = [0x81, 0xFF] [[ -1 ]] = [0x41, 0xFF]
[[ 0 ]] = [0x80] [[ 0 ]] = [0x40]
[[ 1 ]] = [0x81, 0x01] [[ 1 ]] = [0x41, 0x01]
[[ 127 ]] = [0x81, 0x7F] [[ 127 ]] = [0x41, 0x7F]
[[ 128 ]] = [0x82, 0x00, 0x80] [[ 128 ]] = [0x42, 0x00, 0x80]
[[ 255 ]] = [0x82, 0x00, 0xFF] [[ 255 ]] = [0x42, 0x00, 0xFF]
[[ 256 ]] = [0x82, 0x01, 0x00] [[ 256 ]] = [0x42, 0x01, 0x00]
[[ 32767 ]] = [0x82, 0x7F, 0xFF] [[ 32767 ]] = [0x42, 0x7F, 0xFF]
[[ 32768 ]] = [0x83, 0x00, 0x80, 0x00] [[ 32768 ]] = [0x43, 0x00, 0x80, 0x00]
[[ 65535 ]] = [0x83, 0x00, 0xFF, 0xFF] [[ 65535 ]] = [0x43, 0x00, 0xFF, 0xFF]
[[ 65536 ]] = [0x83, 0x01, 0x00, 0x00] [[ 65536 ]] = [0x43, 0x01, 0x00, 0x00]
[[ 131072 ]] = [0x83, 0x02, 0x00, 0x00] [[ 131072 ]] = [0x43, 0x02, 0x00, 0x00]
##### String ##### String
[[ S ]] when S ∈ String = header(2,1,m) ++ utf8(S) Format B (known length):
[[ S ]] when S ∈ String = header(1,1,m) ++ utf8(S)
where m = |utf8(x)| where m = |utf8(x)|
and utf8(x) = the UTF-8 encoding of S and utf8(x) = the UTF-8 encoding of S
To stream a `String`, emit `startbyte(1,1)` and then a sequence of
zero or more format B `String` chunks, followed by `endbyte(1,1)`.
While the overall content of a streamed `String` must be valid UTF-8,
individual chunks do not have to conform to UTF-8.
##### ByteString ##### ByteString
[[ B ]] when B ∈ ByteString = header(2,2,m) ++ B Format B (known length):
[[ B ]] when B ∈ ByteString = header(1,2,m) ++ B
where m = |B| where m = |B|
To stream a `ByteString`, emit `startbyte(1,2)` and then a sequence of
zero or more format B `ByteString` chunks, followed by `endbyte(1,2)`.
##### Symbol ##### Symbol
[[ S ]] when S ∈ Symbol = header(2,2,m) ++ utf8(S) Format B (known length):
[[ S ]] when S ∈ Symbol = header(1,3,m) ++ utf8(S)
where m = |utf8(x)| where m = |utf8(x)|
and utf8(x) = the UTF-8 encoding of S and utf8(x) = the UTF-8 encoding of S
To stream a `Symbol`, emit `startbyte(1,3)` and then a sequence of
zero or more format B `Symbol` chunks, followed by `endbyte(1,3)`.
#### Fixed-length Atoms #### Fixed-length Atoms
Fixed-length atoms all use format A, and do not have a length
representation. They repurpose the bits that format B `Repr`s use to
specify lengths. Applications *MUST NOT* use format C with
`startbyte(0,n)` or `endbyte(0,n)` for any `n`.
##### Booleans ##### Booleans
[[ #f ]] = header(3,0,0) = [0xC0] [[ #f ]] = header(0,0,0) = [0x00]
[[ #t ]] = header(3,0,1) = [0xC1] [[ #t ]] = header(0,0,1) = [0x01]
##### Floats and Doubles ##### Floats and Doubles
[[ F ]] when F ∈ Float = header(3,0,2) ++ binary32(F) [[ F ]] when F ∈ Float = header(0,0,2) ++ binary32(F)
[[ D ]] when D ∈ Double = header(3,0,3) ++ binary64(D) [[ D ]] when D ∈ Double = header(0,0,3) ++ binary64(D)
where binary32(F) and binary64(D) are big-endian 4- and 8-byte where binary32(F) and binary64(D) are big-endian 4- and 8-byte
IEEE 754 binary representations IEEE 754 binary representations
@ -515,21 +641,25 @@ short form label number 0 to label `discard`, 1 to `capture`, and 2 to
| Value | Encoded hexadecimal byte sequence | | Value | Encoded hexadecimal byte sequence |
|--------------------------------------------------------------------|----------------------------------------------------| |--------------------------------------------------------------------|----------------------------------------------------|
| `(capture (discard))` | 11 00 | | `(capture (discard))` | 91 80 |
| `(observe (speak (discard) (capture (discard))))` | 21 33 B5 73 70 65 61 6B 00 11 00 | | `(observe (speak (discard) (capture (discard))))` | A1 B3 75 73 70 65 61 6B 80 91 80 |
| `[1 2 3 4]` | 44 81 01 81 02 81 03 81 04 | | `[1 2 3 4]` (format B) | C4 41 01 41 02 41 03 41 04 |
| `[-2 -1 0 1]` | 54 81 FE 81 FF 80 81 01 | | `[1 2 3 4]` (format C) | 2C 41 01 41 02 41 03 41 04 3C |
| `["hello" there #"world" [] #set{} #t #f]` | 47 95 68 65 6C 6C 6F A5 74 68 65 72 65 40 50 C1 C0 | | `[-2 -1 0 1]` | C4 41 FE 41 FF 40 41 01 |
| `-257` | 82 FE FF | | `"hello"` (format B) | 55 68 65 6C 6C 6F |
| `-1` | 81 FF | | `"hello"` (format C, 2 chunks) | 25 52 68 65 53 6C 6C 6F 35 |
| `0` | 80 | | `"hello"` (format C, 5 chunks) | 25 52 68 65 52 6C 6C 50 50 51 6F 35 |
| `1` | 81 01 | | `["hello" there #"world" [] #set{} #t #f]` | C7 55 68 65 6C 6C 6F 75 74 68 65 72 65 C0 D0 01 00 |
| `255` | 82 00 FF | | `-257` | 42 FE FF |
| `1f` | C2 3F 80 00 00 | | `-1` | 41 FF |
| `1d` | C3 3F F0 00 00 00 00 00 00 | | `0` | 40 |
| `-1.202e300d` | C3 FE 3C B7 B7 59 BF 04 26 | | `1` | 41 01 |
| `255` | 42 00 FF |
| `1f` | 02 3F 80 00 00 |
| `1d` | 03 3F F0 00 00 00 00 00 00 |
| `-1.202e300d` | 03 FE 3C B7 B7 59 BF 04 26 |
Finally, a larger example, using a non-`Symbol` label for a record.[^extensibility2] The `Value` Finally, a larger example, using a non-`Symbol` label for a record.[^extensibility2] The `Record`
([titled person 2 thing 1] ([titled person 2 thing 1]
101 101
@ -539,21 +669,21 @@ Finally, a larger example, using a non-`Symbol` label for a record.[^extensibili
encodes to encodes to
35 ;; Record, generic, 4+1 B5 ;; Record, generic, 4+1
45 ;; Sequence, 5 C5 ;; Sequence, 5
B6 74 69 74 6C 65 64 ;; Symbol, "titled" 76 74 69 74 6C 65 64 ;; Symbol, "titled"
B6 70 65 72 73 6F 6E ;; Symbol, "person" 76 70 65 72 73 6F 6E ;; Symbol, "person"
81 02 ;; SignedInteger, "2" 41 02 ;; SignedInteger, "2"
B5 74 68 69 6E 67 ;; Symbol, "thing" 75 74 68 69 6E 67 ;; Symbol, "thing"
81 01 ;; SignedInteger, "1" 41 01 ;; SignedInteger, "1"
81 65 ;; SignedInteger, "101" 41 65 ;; SignedInteger, "101"
99 42 6C 61 63 6B 77 65 6C 6C ;; String, "Blackwell" 59 42 6C 61 63 6B 77 65 6C 6C ;; String, "Blackwell"
34 ;; Record, generic, 3+1 B4 ;; Record, generic, 3+1
B4 64 61 74 65 ;; Symbol, "date" 74 64 61 74 65 ;; Symbol, "date"
82 07 1D ;; SignedInteger, "1821" 42 07 1D ;; SignedInteger, "1821"
81 02 ;; SignedInteger, "2" 41 02 ;; SignedInteger, "2"
81 03 ;; SignedInteger, "3" 41 03 ;; SignedInteger, "3"
92 44 72 ;; String, "Dr" 52 44 72 ;; String, "Dr"
[^extensibility2]: It happens to line up with Racket's [^extensibility2]: It happens to line up with Racket's
representation of a record label for an inheritance hierarchy representation of a record label for an inheritance hierarchy
@ -608,15 +738,15 @@ pair.
**Examples.** **Examples.**
| `(mime application/octet-stream #"abcde")` | 33 B4 6D 69 6D 65 BF 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D A5 61 62 63 64 65 | | `(mime application/octet-stream #"abcde")` | B3 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
| `(mime text/plain "ABC")` | 33 B4 6D 69 6D 65 BA 74 65 78 74 2F 70 6C 61 69 6E 93 41 42 43 | | `(mime text/plain #"ABC")` | B3 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 |
| `(mime application/xml "<xhtml/>")` | 33 B4 6D 69 6D 65 BF 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 98 3C 78 68 74 6D 6C 2F 3E | | `(mime application/xml #"<xhtml/>")` | B3 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E |
| `(mime text/csv "123,234,345")` | 33 B4 6D 69 6D 65 B8 74 65 78 74 2F 63 73 76 9B 31 32 33 2C 32 33 34 2C 33 34 35 | | `(mime text/csv #"123,234,345")` | B3 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 |
Applications making heavy use of `mime` records may choose to use a Applications making heavy use of `mime` records may choose to use a
short form label number for the record type. For example, if short short form label number for the record type. For example, if short
form label number 1 were chosen, the second example above, `(mime form label number 1 were chosen, the second example above, `(mime
text/plain "ABC")`, would be encoded with "12" in place of "33 B4 6D text/plain "ABC")`, would be encoded with "92" in place of "B3 74 6D
69 6D 65". 69 6D 65".
### Text ### Text
@ -746,26 +876,29 @@ should both be identities.
## Appendix. Table of lead byte values ## Appendix. Table of lead byte values
0x - short form Record label index 0 00 - False
1x - short form Record label index 1 01 - True
2x - short form Record label index 2 02 - Float
3x - Record 03 - Double
4x - Sequence (0x) RESERVED 04-0F
5x - Set (1x) RESERVED 10-1F
6x - Dictionary 2x - Start Stream
(7x) RESERVED 3x - End Stream
8x - SignedInteger
9x - String 4x - SignedInteger
Ax - Bytes 5x - String
Bx - Symbol 6x - Bytes
C0 - False 7x - Symbol
C1 - True
C2 - Float 8x - short form Record label index 0
C3 - Double 9x - short form Record label index 1
(Cx) RESERVED C4-CF Ax - short form Record label index 2
(Dx) RESERVED Bx - Record
(Ex) RESERVED
(Fx) RESERVED Cx - Sequence
Dx - Set
Ex - Dictionary
(Fx) RESERVED F0-FF
## Appendix. Why not Just Use JSON? ## Appendix. Why not Just Use JSON?
@ -942,15 +1075,6 @@ Q. Should I map to SPKI SEXP or is that nonsense / for later?[^why-not-spki-sexp
other kind of structure, and the "hint" itself can only be a other kind of structure, and the "hint" itself can only be a
binary blob. binary blob.
Q. Should `MIMEData` be a special syntax for `Record`s with a single
`ByteString` field?
A. Not even. It should probably just be moved to the "conventions"
section. Compare:
D5 BA text/plain hello -- using special MIMEData encoding
32 BA text/plain A5 hello -- using bog standard type-labelled Record
Q. Should `Symbol` be a special syntax for a `Record` with a `Symbol` Q. Should `Symbol` be a special syntax for a `Record` with a `Symbol`
label (recursive!?) and a single `String` field? label (recursive!?) and a single `String` field?
@ -970,16 +1094,6 @@ Q. Are the language mappings reasonable? How about one for Python?
--- ---
Streaming: needed for variable-sized structures. Tricky to design
syntax for this that isn't gratuitously warty. End byte value.
SIGH. Streaming for text/bytes too I SUPPOSE. Chunks, like CBOR
Literal small integers: could be nice? Not absolutely necessary. Literal small integers: could be nice? Not absolutely necessary.
Maybe reorder: fixed-length atoms first, then variable-length atoms,
then fixed-length compounds, then variable-length compounds? Reason
being that then maybe can put the streaming forms of the
variable-length ones very last.
--- ---