Progress
This commit is contained in:
parent
f2f57385ce
commit
00a69ae012
|
@ -298,73 +298,122 @@ connections to other data languages can also be made.
|
||||||
For now, we limit our attention to an easily-parsed, easily-produced
|
For now, we limit our attention to an easily-parsed, easily-produced
|
||||||
machine-readable syntax.
|
machine-readable syntax.
|
||||||
|
|
||||||
Every `Value` is represented as one or more bytes describing first its
|
A `Repr` is an encoding, or representation, of a specific `Value`.
|
||||||
kind and its length, and then its specific contents.
|
Each `Repr` comprises one or more bytes describing first the kind of
|
||||||
|
represented `Value` and the length of the representation, and then the
|
||||||
|
encoded details of the `Value` itself.
|
||||||
|
|
||||||
For a value `v`, we write `[[v]]` for the encoding of v.
|
For a value `v`, we write `[[v]]` for the `Repr` of v.
|
||||||
|
|
||||||
The following figure summarises the definitions below:
|
The following figure summarises the definitions below:
|
||||||
|
|
||||||
tt nn mmmm varint(m) contents
|
tt nn mmmm varint(m) contents
|
||||||
-------------------------------
|
-------------------------------
|
||||||
|
|
||||||
00 00 mmmm ... application-specific Record
|
00 00 0000 False
|
||||||
00 01 mmmm ... application-specific Record
|
00 00 0001 True
|
||||||
00 10 mmmm ... application-specific Record
|
00 00 0010 Float, 32 bits big-endian binary
|
||||||
00 11 mmmm ... Record
|
00 00 0011 Double, 64 bits big-endian binary
|
||||||
|
00 00 x1xx RESERVED
|
||||||
|
00 00 1xxx RESERVED
|
||||||
|
00 01 xxxx RESERVED
|
||||||
|
00 10 ttnn Start Stream <tt,nn>
|
||||||
|
When tt = 00 --> error
|
||||||
|
01 --> each chunk is a <tt,nn> piece
|
||||||
|
1x --> each chunk is a single encoded Value
|
||||||
|
00 11 ttnn End Stream <tt,nn> (must match preceding Start Stream)
|
||||||
|
|
||||||
01 00 mmmm ... Sequence
|
01 00 mmmm ... SignedInteger, big-endian binary
|
||||||
01 01 mmmm ... Set
|
01 01 mmmm ... String, UTF-8 binary
|
||||||
01 10 mmmm ... Dictionary
|
01 10 mmmm ... Bytes
|
||||||
|
01 11 mmmm ... Symbol, UTF-8 binary
|
||||||
|
|
||||||
10 00 mmmm ... SignedInteger, big-endian binary
|
10 00 mmmm ... application-specific Record
|
||||||
10 01 mmmm ... String, UTF-8 binary
|
10 01 mmmm ... application-specific Record
|
||||||
10 10 mmmm ... Bytes
|
10 10 mmmm ... application-specific Record
|
||||||
10 11 mmmm ... Symbol, UTF-8 binary
|
10 11 mmmm ... Record
|
||||||
|
|
||||||
11 00 0000 False
|
11 00 mmmm ... Sequence
|
||||||
11 00 0001 True
|
11 01 mmmm ... Set
|
||||||
11 00 0010 Float, 32 bits big-endian binary
|
11 10 mmmm ... Dictionary
|
||||||
11 00 0011 Double, 64 bits big-endian binary
|
11 11 xxxx RESERVED
|
||||||
|
|
||||||
If mmmm = 1111, varint(m) is present; otherwise, m is the length
|
If mmmm = 1111, varint(m) is present; otherwise, m is the length
|
||||||
|
|
||||||
#### Type and Length representation
|
#### Type and Length representation
|
||||||
|
|
||||||
A `Value`'s type and length is represented by use of a function
|
Each `Repr` takes one of three possible forms:
|
||||||
`header(t,n,m)` that yields a sequence of bytes when `t`, `n` and `m`
|
|
||||||
are appropriate non-negative integers.
|
|
||||||
|
|
||||||
header(t,n,m) = leadbyte(t,n,m) when m < 15
|
- (A) a fixed-length form, used for simple values such as `Boolean`s
|
||||||
or leadbyte(t,n,15) ++ varint(m) otherwise
|
or `Float`s.
|
||||||
|
|
||||||
The lead byte in a `Value`'s representation is constructed by a function
|
- (B) a variable-length form with length specified up-front, used for
|
||||||
|
almost all `Record`s as well as for most `Sequence`s and `String`s,
|
||||||
|
when their sizes are known at the time serialization begins.
|
||||||
|
|
||||||
|
- (C) a variable-length streaming form with unknown or unpredictable
|
||||||
|
length, used only seldom for `Record`s, since the number of fields
|
||||||
|
in a `Record` is usually statically known, but sometimes used for
|
||||||
|
`Sequence`s, `String`s etc., such as in cases when serialization
|
||||||
|
begins before the number of elements or bytes in the corresponding
|
||||||
|
`Value` is known.
|
||||||
|
|
||||||
|
Applications may choose between formats (B) and (C) depending on their
|
||||||
|
needs at serialization time.
|
||||||
|
|
||||||
|
Every `Repr`, however, starts with a *lead byte* describing the
|
||||||
|
remainder of the representation.
|
||||||
|
|
||||||
|
##### The lead byte
|
||||||
|
|
||||||
|
The lead byte is constructed by a function `leadbyte`:
|
||||||
|
|
||||||
leadbyte(t,n,m) = [t*64 + n*16 + m]
|
leadbyte(t,n,m) = [t*64 + n*16 + m]
|
||||||
|
|
||||||
|
Both `t` and `n` are two-bit unsigned numbers; `m` is a four-bit
|
||||||
|
unsigned number.
|
||||||
|
|
||||||
The lead byte describes the rest of the representation as
|
The lead byte describes the rest of the representation as
|
||||||
follows:[^some-encodings-unused]
|
follows:[^some-encodings-unused]
|
||||||
|
|
||||||
leadbyte(0,-,-) represents a Record
|
|
||||||
leadbyte(1,-,-) represents a Sequence, Set or Dictionary
|
|
||||||
leadbyte(2,-,-) represents an Atom with variable-length binary representation
|
|
||||||
leadbyte(3,0,-) represents an Atom with fixed-length binary representation
|
|
||||||
|
|
||||||
[^some-encodings-unused]: Some encodings are unused. All such
|
[^some-encodings-unused]: Some encodings are unused. All such
|
||||||
encodings are reserved for future versions of this specification.
|
encodings are reserved for future versions of this specification.
|
||||||
|
|
||||||
Variable-length representations use the value of `m` to encode their
|
- `leadbyte(0,0,-)` (format A) represents an Atom with fixed-length binary representation.
|
||||||
lengths:
|
- `leadbyte(0,1,-)` (format A) is RESERVED.
|
||||||
|
- `leadbyte(0,2,-)` (format C) is a Stream Start byte.
|
||||||
|
- `leadbyte(0,3,-)` (format C) is a Stream End byte.
|
||||||
|
- `leadbyte(1,-,-)` (format B) represents an Atom with variable-length binary representation.
|
||||||
|
- `leadbyte(2,-,-)` (format B) represents a Record.
|
||||||
|
- `leadbyte(3,-,-)` (format B) represents a Sequence, Set or Dictionary.
|
||||||
|
|
||||||
- Lengths between 0 and 14 are represented using `leadbyte` with `m`
|
##### Encoding data of fixed length (format A)
|
||||||
values 0 through 14.
|
|
||||||
- Lengths of 15 or greater are represented by `m` value 15, and
|
|
||||||
additional "length bytes" describing the length then follow the
|
|
||||||
lead byte.
|
|
||||||
|
|
||||||
These additional length bytes are formatted as
|
Each specific type of data defines its own rules for this format.
|
||||||
[base 128 varints][varint]. Quoting the
|
|
||||||
[Google Protocol Buffers][varint] definition,
|
##### Encoding data of known length (format B)
|
||||||
|
|
||||||
|
A `Repr` where the length of the `Value` to be encoded is variable but
|
||||||
|
known uses the value of `m` in `leadbyte` to encode its length. The
|
||||||
|
length counts *bytes* for atomic `Value`s, but counts *contained
|
||||||
|
values* for compound `Value`s.
|
||||||
|
|
||||||
|
- A length `l` between 0 and 14 is represented using `leadbyte` with
|
||||||
|
`m=l`.
|
||||||
|
- A length of 15 or greater is represented by `m=15` and additional
|
||||||
|
bytes describing the length following the lead byte.
|
||||||
|
|
||||||
|
The function `header(t,n,m)` yields an appropriate sequence of bytes
|
||||||
|
describing a `Repr`'s type and length when `t`, `n` and `m` are
|
||||||
|
appropriate non-negative integers:
|
||||||
|
|
||||||
|
header(t,n,m) = leadbyte(t,n,m) when m < 15
|
||||||
|
or leadbyte(t,n,15) ++ varint(m) otherwise
|
||||||
|
|
||||||
|
The additional length bytes are formatted as
|
||||||
|
[base 128 varints][varint]. We write `varint(m)` for the
|
||||||
|
varint-encoding of `m`. Quoting the [Google Protocol Buffers][varint]
|
||||||
|
definition,
|
||||||
|
|
||||||
> Each byte in a varint, except the last byte, has the most
|
> Each byte in a varint, except the last byte, has the most
|
||||||
> significant bit (msb) set – this indicates that there are further
|
> significant bit (msb) set – this indicates that there are further
|
||||||
|
@ -378,43 +427,93 @@ These additional length bytes are formatted as
|
||||||
- 300 (binary, grouped into 7-bit chunks, `10 0101100`) varint-encodes to the two bytes 172 and 2.
|
- 300 (binary, grouped into 7-bit chunks, `10 0101100`) varint-encodes to the two bytes 172 and 2.
|
||||||
- 1000000000 (binary `11 1011100 1101011 0010100 0000000`) varint-encodes to bytes 128, 148, 235, 220, and 3.
|
- 1000000000 (binary `11 1011100 1101011 0010100 0000000`) varint-encodes to bytes 128, 148, 235, 220, and 3.
|
||||||
|
|
||||||
We write `varint(m)` for the varint-encoding of `m`.
|
##### Streaming data of unknown length (format C)
|
||||||
|
|
||||||
|
A `Repr` where the length of the `Value` to be encoded is variable and
|
||||||
|
not known at the time serialization of the `Value` starts is encoded
|
||||||
|
by a single Stream Start byte, followed by zero or more *chunks*,
|
||||||
|
followed by a matching Stream End byte:
|
||||||
|
|
||||||
|
startbyte(t,n) = leadbyte(0,2, t*4 + n)
|
||||||
|
endbyte(t,n) = leadbyte(0,3, t*4 + n)
|
||||||
|
|
||||||
|
For a `Repr` of a `Value` containing binary data, each chunk is to be
|
||||||
|
a format B `Repr` of the same type as the overall `Repr`.
|
||||||
|
|
||||||
|
For a `Repr` of a `Value` containing other `Value`s, each chunk is to
|
||||||
|
be a single `Repr`.
|
||||||
|
|
||||||
#### Records
|
#### Records
|
||||||
|
|
||||||
[[ (L F_1 ... F_m) ]] = header(0,3,m+1) ++ [[L]] ++ [[F_1]] ++ ... ++ [[F_m]]
|
Format B (known length):
|
||||||
|
|
||||||
|
[[ (L F_1 ... F_m) ]] = header(2,3,m+1) ++ [[L]] ++ [[F_1]] ++ ... ++ [[F_m]]
|
||||||
|
|
||||||
For `m` fields, `m+1` is supplied to `header`, to account for the
|
For `m` fields, `m+1` is supplied to `header`, to account for the
|
||||||
encoding of the record label.
|
encoding of the record label.
|
||||||
|
|
||||||
|
Format C (streaming):
|
||||||
|
|
||||||
|
[[ (L F_1 ... F_m) ]]
|
||||||
|
= startbyte(2,3) ++ [[L]] ++ [[F_1]] ++ ... ++ [[F_m]] ++ endbyte(2,3)
|
||||||
|
|
||||||
|
Applications *SHOULD* prefer the known-length format for encoding
|
||||||
|
`Record`s.
|
||||||
|
|
||||||
##### Application-specific short form for labels
|
##### Application-specific short form for labels
|
||||||
|
|
||||||
Any given protocol using Preserves may additionally define an
|
Any given protocol using Preserves may additionally define an
|
||||||
interpretation for `n ∈ {0,1,2}`, mapping each *short form label
|
interpretation for `n ∈ {0,1,2}`, mapping each *short form label
|
||||||
number* `n` to a specific record label. When encoding `m` fields with
|
number* `n` to a specific record label. When encoding `m` fields with
|
||||||
short form label number `n`, the header is `header(0,n,m)` (rather
|
short form label number `n`, format B becomes
|
||||||
than `m+1`) since the label is implicit.
|
|
||||||
|
header(2,n,m) ++ [[F_1]] ++ ... ++ [[F_m]]
|
||||||
|
|
||||||
|
and format C becomes
|
||||||
|
|
||||||
|
startbyte(2,n) ++ [[F_1]] ++ ... ++ [[F_m]] ++ endbyte(2,n)
|
||||||
|
|
||||||
**Examples.** For example, a protocol may choose to map records
|
**Examples.** For example, a protocol may choose to map records
|
||||||
labelled `void` to `n=0`, making
|
labelled `void` to `n=0`, making
|
||||||
|
|
||||||
[[(void)]] = header(0,0,0) = [0x00]
|
[[(void)]] = header(2,0,0) = [0x80]
|
||||||
|
|
||||||
or it may map records labelled `person` to short form label number 1,
|
or it may map records labelled `person` to short form label number 1,
|
||||||
making
|
making
|
||||||
|
|
||||||
[[(person "Dr" "Elizabeth" "Blackwell")]]
|
[[(person "Dr" "Elizabeth" "Blackwell")]]
|
||||||
= header(0,1,3) ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]`
|
= header(2,1,3) ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
|
||||||
= [0x13] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]`
|
= [0x93] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
|
||||||
|
|
||||||
|
for format B, or
|
||||||
|
|
||||||
|
= startbyte(2,1) ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]] ++ endbyte(2,1)
|
||||||
|
= [0x29] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]] ++ [0x39]
|
||||||
|
|
||||||
|
for format C.
|
||||||
|
|
||||||
#### Sequences, Sets and Dictionaries
|
#### Sequences, Sets and Dictionaries
|
||||||
|
|
||||||
[[ [X_1 ... X_m] ]] = header(1,0,m) ++ [[X_1]] ++ ... ++ [[X_m]]
|
Format B (known length):
|
||||||
|
|
||||||
[[ #set{X_1 ... X_m} ]] = header(1,1,m) ++ [[X_1]] ++ ... ++ [[X_m]]
|
[[ [X_1 ... X_m] ]] = header(3,0,m) ++ [[X_1]] ++ ... ++ [[X_m]]
|
||||||
|
|
||||||
|
[[ #set{X_1 ... X_m} ]] = header(3,1,m) ++ [[X_1]] ++ ... ++ [[X_m]]
|
||||||
|
|
||||||
[[ #dict{K_1:V_1 ... K_m:V_m} ]]
|
[[ #dict{K_1:V_1 ... K_m:V_m} ]]
|
||||||
= header(1,2,m) ++ [[K_1]] ++ [[V_1]] ++ ... ++ [[K_m]] ++ [[V_m]]
|
= header(3,2,m) ++ [[K_1]] ++ [[V_1]] ++ ... ++ [[K_m]] ++ [[V_m]]
|
||||||
|
|
||||||
|
Format C (streaming):
|
||||||
|
|
||||||
|
[[ [X_1 ... X_m] ]] = startbyte(3,0) ++ [[X_1]] ++ ... ++ [[X_m]] ++ endbyte(3,0)
|
||||||
|
|
||||||
|
[[ #set{X_1 ... X_m} ]] = startbyte(3,1) ++ [[X_1]] ++ ... ++ [[X_m]] ++ endbyte(3,1)
|
||||||
|
|
||||||
|
[[ #dict{K_1:V_1 ... K_m:V_m} ]]
|
||||||
|
= startbyte(3,2) ++ [[K_1]] ++ [[V_1]] ++ ... ++ [[K_m]] ++ [[V_m]] ++ endbyte(3,2)
|
||||||
|
|
||||||
|
Applications may use whichever format suits their needs on a
|
||||||
|
case-by-case basis.
|
||||||
|
|
||||||
There is *no* ordering requirement on the `X_i` elements or
|
There is *no* ordering requirement on the `X_i` elements or
|
||||||
`K_i`/`V_i` pairs.[^no-sorting-rationale] They may appear in any
|
`K_i`/`V_i` pairs.[^no-sorting-rationale] They may appear in any
|
||||||
|
@ -432,19 +531,23 @@ order.
|
||||||
(b) sorting keys or elements makes no sense in streaming
|
(b) sorting keys or elements makes no sense in streaming
|
||||||
serialization formats.
|
serialization formats.
|
||||||
|
|
||||||
Note that `n=3` is unused and reserved.
|
Note that `header(3,3,m)` and `startbyte(3,3)`/`endbyte(3,3)` is unused and reserved.
|
||||||
|
|
||||||
#### Variable-length Atoms
|
#### Variable-length Atoms
|
||||||
|
|
||||||
##### SignedInteger
|
##### SignedInteger
|
||||||
|
|
||||||
[[ x ]] when x ∈ SignedInteger = header(2,0,m) ++ intbytes(x)
|
Format B (known length):
|
||||||
|
|
||||||
|
[[ x ]] when x ∈ SignedInteger = header(1,0,m) ++ intbytes(x)
|
||||||
where m = |intbytes(x)|
|
where m = |intbytes(x)|
|
||||||
and intbytes(x) = a big-endian two's-complement representation
|
and intbytes(x) = a big-endian two's-complement representation
|
||||||
of the signed integer x, taking exactly as
|
of the signed integer x, taking exactly as
|
||||||
many whole bytes as needed to unambiguously
|
many whole bytes as needed to unambiguously
|
||||||
identify the value
|
identify the value
|
||||||
|
|
||||||
|
Format C *MUST NOT* be used for `SignedInteger`s.
|
||||||
|
|
||||||
The value 0 needs zero bytes to identify the value, so `intbytes(0)`
|
The value 0 needs zero bytes to identify the value, so `intbytes(0)`
|
||||||
is the empty byte string. Non-zero values need at least one byte; the
|
is the empty byte string. Non-zero values need at least one byte; the
|
||||||
most-significant bit in the first byte in `intbytes(x)` for `x≠0` is
|
most-significant bit in the first byte in `intbytes(x)` for `x≠0` is
|
||||||
|
@ -452,55 +555,78 @@ the sign bit.
|
||||||
|
|
||||||
For example,
|
For example,
|
||||||
|
|
||||||
[[ -257 ]] = [0x82, 0xFE, 0xFF]
|
[[ -257 ]] = [0x42, 0xFE, 0xFF]
|
||||||
[[ -256 ]] = [0x82, 0xFF, 0x00]
|
[[ -256 ]] = [0x42, 0xFF, 0x00]
|
||||||
[[ -255 ]] = [0x82, 0xFF, 0x01]
|
[[ -255 ]] = [0x42, 0xFF, 0x01]
|
||||||
[[ -254 ]] = [0x82, 0xFF, 0x02]
|
[[ -254 ]] = [0x42, 0xFF, 0x02]
|
||||||
[[ -129 ]] = [0x82, 0xFF, 0x7F]
|
[[ -129 ]] = [0x42, 0xFF, 0x7F]
|
||||||
[[ -128 ]] = [0x81, 0x80]
|
[[ -128 ]] = [0x41, 0x80]
|
||||||
[[ -127 ]] = [0x81, 0x81]
|
[[ -127 ]] = [0x41, 0x81]
|
||||||
[[ -2 ]] = [0x81, 0xFE]
|
[[ -2 ]] = [0x41, 0xFE]
|
||||||
[[ -1 ]] = [0x81, 0xFF]
|
[[ -1 ]] = [0x41, 0xFF]
|
||||||
[[ 0 ]] = [0x80]
|
[[ 0 ]] = [0x40]
|
||||||
[[ 1 ]] = [0x81, 0x01]
|
[[ 1 ]] = [0x41, 0x01]
|
||||||
[[ 127 ]] = [0x81, 0x7F]
|
[[ 127 ]] = [0x41, 0x7F]
|
||||||
[[ 128 ]] = [0x82, 0x00, 0x80]
|
[[ 128 ]] = [0x42, 0x00, 0x80]
|
||||||
[[ 255 ]] = [0x82, 0x00, 0xFF]
|
[[ 255 ]] = [0x42, 0x00, 0xFF]
|
||||||
[[ 256 ]] = [0x82, 0x01, 0x00]
|
[[ 256 ]] = [0x42, 0x01, 0x00]
|
||||||
[[ 32767 ]] = [0x82, 0x7F, 0xFF]
|
[[ 32767 ]] = [0x42, 0x7F, 0xFF]
|
||||||
[[ 32768 ]] = [0x83, 0x00, 0x80, 0x00]
|
[[ 32768 ]] = [0x43, 0x00, 0x80, 0x00]
|
||||||
[[ 65535 ]] = [0x83, 0x00, 0xFF, 0xFF]
|
[[ 65535 ]] = [0x43, 0x00, 0xFF, 0xFF]
|
||||||
[[ 65536 ]] = [0x83, 0x01, 0x00, 0x00]
|
[[ 65536 ]] = [0x43, 0x01, 0x00, 0x00]
|
||||||
[[ 131072 ]] = [0x83, 0x02, 0x00, 0x00]
|
[[ 131072 ]] = [0x43, 0x02, 0x00, 0x00]
|
||||||
|
|
||||||
##### String
|
##### String
|
||||||
|
|
||||||
[[ S ]] when S ∈ String = header(2,1,m) ++ utf8(S)
|
Format B (known length):
|
||||||
|
|
||||||
|
[[ S ]] when S ∈ String = header(1,1,m) ++ utf8(S)
|
||||||
where m = |utf8(x)|
|
where m = |utf8(x)|
|
||||||
and utf8(x) = the UTF-8 encoding of S
|
and utf8(x) = the UTF-8 encoding of S
|
||||||
|
|
||||||
|
To stream a `String`, emit `startbyte(1,1)` and then a sequence of
|
||||||
|
zero or more format B `String` chunks, followed by `endbyte(1,1)`.
|
||||||
|
|
||||||
|
While the overall content of a streamed `String` must be valid UTF-8,
|
||||||
|
individual chunks do not have to conform to UTF-8.
|
||||||
|
|
||||||
##### ByteString
|
##### ByteString
|
||||||
|
|
||||||
[[ B ]] when B ∈ ByteString = header(2,2,m) ++ B
|
Format B (known length):
|
||||||
|
|
||||||
|
[[ B ]] when B ∈ ByteString = header(1,2,m) ++ B
|
||||||
where m = |B|
|
where m = |B|
|
||||||
|
|
||||||
|
To stream a `ByteString`, emit `startbyte(1,2)` and then a sequence of
|
||||||
|
zero or more format B `ByteString` chunks, followed by `endbyte(1,2)`.
|
||||||
|
|
||||||
##### Symbol
|
##### Symbol
|
||||||
|
|
||||||
[[ S ]] when S ∈ Symbol = header(2,2,m) ++ utf8(S)
|
Format B (known length):
|
||||||
|
|
||||||
|
[[ S ]] when S ∈ Symbol = header(1,3,m) ++ utf8(S)
|
||||||
where m = |utf8(x)|
|
where m = |utf8(x)|
|
||||||
and utf8(x) = the UTF-8 encoding of S
|
and utf8(x) = the UTF-8 encoding of S
|
||||||
|
|
||||||
|
To stream a `Symbol`, emit `startbyte(1,3)` and then a sequence of
|
||||||
|
zero or more format B `Symbol` chunks, followed by `endbyte(1,3)`.
|
||||||
|
|
||||||
#### Fixed-length Atoms
|
#### Fixed-length Atoms
|
||||||
|
|
||||||
|
Fixed-length atoms all use format A, and do not have a length
|
||||||
|
representation. They repurpose the bits that format B `Repr`s use to
|
||||||
|
specify lengths. Applications *MUST NOT* use format C with
|
||||||
|
`startbyte(0,n)` or `endbyte(0,n)` for any `n`.
|
||||||
|
|
||||||
##### Booleans
|
##### Booleans
|
||||||
|
|
||||||
[[ #f ]] = header(3,0,0) = [0xC0]
|
[[ #f ]] = header(0,0,0) = [0x00]
|
||||||
[[ #t ]] = header(3,0,1) = [0xC1]
|
[[ #t ]] = header(0,0,1) = [0x01]
|
||||||
|
|
||||||
##### Floats and Doubles
|
##### Floats and Doubles
|
||||||
|
|
||||||
[[ F ]] when F ∈ Float = header(3,0,2) ++ binary32(F)
|
[[ F ]] when F ∈ Float = header(0,0,2) ++ binary32(F)
|
||||||
[[ D ]] when D ∈ Double = header(3,0,3) ++ binary64(D)
|
[[ D ]] when D ∈ Double = header(0,0,3) ++ binary64(D)
|
||||||
where binary32(F) and binary64(D) are big-endian 4- and 8-byte
|
where binary32(F) and binary64(D) are big-endian 4- and 8-byte
|
||||||
IEEE 754 binary representations
|
IEEE 754 binary representations
|
||||||
|
|
||||||
|
@ -515,21 +641,25 @@ short form label number 0 to label `discard`, 1 to `capture`, and 2 to
|
||||||
|
|
||||||
| Value | Encoded hexadecimal byte sequence |
|
| Value | Encoded hexadecimal byte sequence |
|
||||||
|--------------------------------------------------------------------|----------------------------------------------------|
|
|--------------------------------------------------------------------|----------------------------------------------------|
|
||||||
| `(capture (discard))` | 11 00 |
|
| `(capture (discard))` | 91 80 |
|
||||||
| `(observe (speak (discard) (capture (discard))))` | 21 33 B5 73 70 65 61 6B 00 11 00 |
|
| `(observe (speak (discard) (capture (discard))))` | A1 B3 75 73 70 65 61 6B 80 91 80 |
|
||||||
| `[1 2 3 4]` | 44 81 01 81 02 81 03 81 04 |
|
| `[1 2 3 4]` (format B) | C4 41 01 41 02 41 03 41 04 |
|
||||||
| `[-2 -1 0 1]` | 54 81 FE 81 FF 80 81 01 |
|
| `[1 2 3 4]` (format C) | 2C 41 01 41 02 41 03 41 04 3C |
|
||||||
| `["hello" there #"world" [] #set{} #t #f]` | 47 95 68 65 6C 6C 6F A5 74 68 65 72 65 40 50 C1 C0 |
|
| `[-2 -1 0 1]` | C4 41 FE 41 FF 40 41 01 |
|
||||||
| `-257` | 82 FE FF |
|
| `"hello"` (format B) | 55 68 65 6C 6C 6F |
|
||||||
| `-1` | 81 FF |
|
| `"hello"` (format C, 2 chunks) | 25 52 68 65 53 6C 6C 6F 35 |
|
||||||
| `0` | 80 |
|
| `"hello"` (format C, 5 chunks) | 25 52 68 65 52 6C 6C 50 50 51 6F 35 |
|
||||||
| `1` | 81 01 |
|
| `["hello" there #"world" [] #set{} #t #f]` | C7 55 68 65 6C 6C 6F 75 74 68 65 72 65 C0 D0 01 00 |
|
||||||
| `255` | 82 00 FF |
|
| `-257` | 42 FE FF |
|
||||||
| `1f` | C2 3F 80 00 00 |
|
| `-1` | 41 FF |
|
||||||
| `1d` | C3 3F F0 00 00 00 00 00 00 |
|
| `0` | 40 |
|
||||||
| `-1.202e300d` | C3 FE 3C B7 B7 59 BF 04 26 |
|
| `1` | 41 01 |
|
||||||
|
| `255` | 42 00 FF |
|
||||||
|
| `1f` | 02 3F 80 00 00 |
|
||||||
|
| `1d` | 03 3F F0 00 00 00 00 00 00 |
|
||||||
|
| `-1.202e300d` | 03 FE 3C B7 B7 59 BF 04 26 |
|
||||||
|
|
||||||
Finally, a larger example, using a non-`Symbol` label for a record.[^extensibility2] The `Value`
|
Finally, a larger example, using a non-`Symbol` label for a record.[^extensibility2] The `Record`
|
||||||
|
|
||||||
([titled person 2 thing 1]
|
([titled person 2 thing 1]
|
||||||
101
|
101
|
||||||
|
@ -539,21 +669,21 @@ Finally, a larger example, using a non-`Symbol` label for a record.[^extensibili
|
||||||
|
|
||||||
encodes to
|
encodes to
|
||||||
|
|
||||||
35 ;; Record, generic, 4+1
|
B5 ;; Record, generic, 4+1
|
||||||
45 ;; Sequence, 5
|
C5 ;; Sequence, 5
|
||||||
B6 74 69 74 6C 65 64 ;; Symbol, "titled"
|
76 74 69 74 6C 65 64 ;; Symbol, "titled"
|
||||||
B6 70 65 72 73 6F 6E ;; Symbol, "person"
|
76 70 65 72 73 6F 6E ;; Symbol, "person"
|
||||||
81 02 ;; SignedInteger, "2"
|
41 02 ;; SignedInteger, "2"
|
||||||
B5 74 68 69 6E 67 ;; Symbol, "thing"
|
75 74 68 69 6E 67 ;; Symbol, "thing"
|
||||||
81 01 ;; SignedInteger, "1"
|
41 01 ;; SignedInteger, "1"
|
||||||
81 65 ;; SignedInteger, "101"
|
41 65 ;; SignedInteger, "101"
|
||||||
99 42 6C 61 63 6B 77 65 6C 6C ;; String, "Blackwell"
|
59 42 6C 61 63 6B 77 65 6C 6C ;; String, "Blackwell"
|
||||||
34 ;; Record, generic, 3+1
|
B4 ;; Record, generic, 3+1
|
||||||
B4 64 61 74 65 ;; Symbol, "date"
|
74 64 61 74 65 ;; Symbol, "date"
|
||||||
82 07 1D ;; SignedInteger, "1821"
|
42 07 1D ;; SignedInteger, "1821"
|
||||||
81 02 ;; SignedInteger, "2"
|
41 02 ;; SignedInteger, "2"
|
||||||
81 03 ;; SignedInteger, "3"
|
41 03 ;; SignedInteger, "3"
|
||||||
92 44 72 ;; String, "Dr"
|
52 44 72 ;; String, "Dr"
|
||||||
|
|
||||||
[^extensibility2]: It happens to line up with Racket's
|
[^extensibility2]: It happens to line up with Racket's
|
||||||
representation of a record label for an inheritance hierarchy
|
representation of a record label for an inheritance hierarchy
|
||||||
|
@ -608,15 +738,15 @@ pair.
|
||||||
|
|
||||||
**Examples.**
|
**Examples.**
|
||||||
|
|
||||||
| `(mime application/octet-stream #"abcde")` | 33 B4 6D 69 6D 65 BF 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D A5 61 62 63 64 65 |
|
| `(mime application/octet-stream #"abcde")` | B3 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
|
||||||
| `(mime text/plain "ABC")` | 33 B4 6D 69 6D 65 BA 74 65 78 74 2F 70 6C 61 69 6E 93 41 42 43 |
|
| `(mime text/plain #"ABC")` | B3 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 |
|
||||||
| `(mime application/xml "<xhtml/>")` | 33 B4 6D 69 6D 65 BF 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 98 3C 78 68 74 6D 6C 2F 3E |
|
| `(mime application/xml #"<xhtml/>")` | B3 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E |
|
||||||
| `(mime text/csv "123,234,345")` | 33 B4 6D 69 6D 65 B8 74 65 78 74 2F 63 73 76 9B 31 32 33 2C 32 33 34 2C 33 34 35 |
|
| `(mime text/csv #"123,234,345")` | B3 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 |
|
||||||
|
|
||||||
Applications making heavy use of `mime` records may choose to use a
|
Applications making heavy use of `mime` records may choose to use a
|
||||||
short form label number for the record type. For example, if short
|
short form label number for the record type. For example, if short
|
||||||
form label number 1 were chosen, the second example above, `(mime
|
form label number 1 were chosen, the second example above, `(mime
|
||||||
text/plain "ABC")`, would be encoded with "12" in place of "33 B4 6D
|
text/plain "ABC")`, would be encoded with "92" in place of "B3 74 6D
|
||||||
69 6D 65".
|
69 6D 65".
|
||||||
|
|
||||||
### Text
|
### Text
|
||||||
|
@ -746,26 +876,29 @@ should both be identities.
|
||||||
|
|
||||||
## Appendix. Table of lead byte values
|
## Appendix. Table of lead byte values
|
||||||
|
|
||||||
0x - short form Record label index 0
|
00 - False
|
||||||
1x - short form Record label index 1
|
01 - True
|
||||||
2x - short form Record label index 2
|
02 - Float
|
||||||
3x - Record
|
03 - Double
|
||||||
4x - Sequence
|
(0x) RESERVED 04-0F
|
||||||
5x - Set
|
(1x) RESERVED 10-1F
|
||||||
6x - Dictionary
|
2x - Start Stream
|
||||||
(7x) RESERVED
|
3x - End Stream
|
||||||
8x - SignedInteger
|
|
||||||
9x - String
|
4x - SignedInteger
|
||||||
Ax - Bytes
|
5x - String
|
||||||
Bx - Symbol
|
6x - Bytes
|
||||||
C0 - False
|
7x - Symbol
|
||||||
C1 - True
|
|
||||||
C2 - Float
|
8x - short form Record label index 0
|
||||||
C3 - Double
|
9x - short form Record label index 1
|
||||||
(Cx) RESERVED C4-CF
|
Ax - short form Record label index 2
|
||||||
(Dx) RESERVED
|
Bx - Record
|
||||||
(Ex) RESERVED
|
|
||||||
(Fx) RESERVED
|
Cx - Sequence
|
||||||
|
Dx - Set
|
||||||
|
Ex - Dictionary
|
||||||
|
(Fx) RESERVED F0-FF
|
||||||
|
|
||||||
## Appendix. Why not Just Use JSON?
|
## Appendix. Why not Just Use JSON?
|
||||||
|
|
||||||
|
@ -942,15 +1075,6 @@ Q. Should I map to SPKI SEXP or is that nonsense / for later?[^why-not-spki-sexp
|
||||||
other kind of structure, and the "hint" itself can only be a
|
other kind of structure, and the "hint" itself can only be a
|
||||||
binary blob.
|
binary blob.
|
||||||
|
|
||||||
Q. Should `MIMEData` be a special syntax for `Record`s with a single
|
|
||||||
`ByteString` field?
|
|
||||||
|
|
||||||
A. Not even. It should probably just be moved to the "conventions"
|
|
||||||
section. Compare:
|
|
||||||
|
|
||||||
D5 BA text/plain hello -- using special MIMEData encoding
|
|
||||||
32 BA text/plain A5 hello -- using bog standard type-labelled Record
|
|
||||||
|
|
||||||
Q. Should `Symbol` be a special syntax for a `Record` with a `Symbol`
|
Q. Should `Symbol` be a special syntax for a `Record` with a `Symbol`
|
||||||
label (recursive!?) and a single `String` field?
|
label (recursive!?) and a single `String` field?
|
||||||
|
|
||||||
|
@ -970,16 +1094,6 @@ Q. Are the language mappings reasonable? How about one for Python?
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
Streaming: needed for variable-sized structures. Tricky to design
|
|
||||||
syntax for this that isn't gratuitously warty. End byte value.
|
|
||||||
|
|
||||||
SIGH. Streaming for text/bytes too I SUPPOSE. Chunks, like CBOR
|
|
||||||
|
|
||||||
Literal small integers: could be nice? Not absolutely necessary.
|
Literal small integers: could be nice? Not absolutely necessary.
|
||||||
|
|
||||||
Maybe reorder: fixed-length atoms first, then variable-length atoms,
|
|
||||||
then fixed-length compounds, then variable-length compounds? Reason
|
|
||||||
being that then maybe can put the streaming forms of the
|
|
||||||
variable-length ones very last.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
Loading…
Reference in New Issue