Update tagging scheme
This commit is contained in:
parent
5562add7ba
commit
775741f944
|
@ -66,76 +66,36 @@ Either way, the tag on the special `Ref` is the type of the encoded value.
|
||||||
|
|
||||||
### Tags and Refs.
|
### Tags and Refs.
|
||||||
|
|
||||||
................................................................
|
The following table maps bit values in the low (leftmost) byte of a `Ref`
|
||||||
Version 1
|
to their interpretation. In interpretations including a three-bit `nnn`
|
||||||
|
value, the `nnn` bits specify the length of the used portion of the
|
||||||
|
remaining 56 bits of the `Ref`, counted in bytes, starting from the
|
||||||
|
following byte, with value `000` disallowed.
|
||||||
|
|
||||||
00000000 IMM bool
|
Bit number Meaning
|
||||||
...00100 IMM RESERVED
|
7654 3210
|
||||||
nnn01000 IMM float nnn = length of payload in bytes. 000 disallowed
|
--------- --- -------------------------------------------------------------
|
||||||
nnn10000 IMM str
|
0000 0000 IMM Boolean; next byte = 0 means false; 1 means true.
|
||||||
nnn10100 IMM bytes
|
...1 0000 IMM reserved
|
||||||
nnn11000 IMM sym
|
nnn0 0001 IMM Float: nnn must be 100, meaning a 32-bit IEEE754 value.
|
||||||
|
nnn1 0001 IMM ByteString
|
||||||
|
nnn0 0010 IMM String
|
||||||
|
nnn1 0010 IMM Symbol
|
||||||
|
|
||||||
....1100 IMM int
|
.... 0011 IMM SignedInteger between -2^59 and (2^59)-1, inclusive
|
||||||
|
|
||||||
.....010 RESERVED
|
.... 0100 PTR SignedInteger outside the immediate range
|
||||||
....0110 PTR embedded
|
.... 0101 PTR String
|
||||||
....1110 PTR float
|
.... 0110 PTR ByteString
|
||||||
|
.... 0111 PTR Symbol
|
||||||
....0001 PTR str
|
.... 1000 PTR Record
|
||||||
....0101 PTR bytes
|
.... 1001 PTR Sequence
|
||||||
....1001 PTR sym
|
.... 1010 PTR Set
|
||||||
....1101 PTR int
|
.... 1011 PTR Dictionary
|
||||||
|
.... 1100 PTR Embedded
|
||||||
....0011 PTR rec
|
.... 1101 PTR Double: length of pointed-to Buf must be 8
|
||||||
....0111 PTR seq
|
.... 1110 reserved
|
||||||
....1011 PTR set
|
.... 1111 reserved
|
||||||
....1111 PTR map
|
|
||||||
|
|
||||||
................................................................
|
|
||||||
Version 2
|
|
||||||
|
|
||||||
0000 0000 IMM bool
|
|
||||||
...1 0000 IMM RESERVED
|
|
||||||
nnn0 0001 IMM float nnn = length of payload in bytes. 000 disallowed
|
|
||||||
nnn1 0001 IMM bytes
|
|
||||||
nnn0 0010 IMM str
|
|
||||||
nnn1 0010 IMM sym
|
|
||||||
|
|
||||||
.... 0011 IMM int
|
|
||||||
|
|
||||||
.... 0100 PTR int
|
|
||||||
.... 0101 PTR str
|
|
||||||
.... 0110 PTR bytes
|
|
||||||
.... 0111 PTR sym
|
|
||||||
.... 1000 PTR rec
|
|
||||||
.... 1001 PTR seq
|
|
||||||
.... 1010 PTR set
|
|
||||||
.... 1011 PTR map
|
|
||||||
.... 1100 PTR embedded
|
|
||||||
.... 1101 PTR float
|
|
||||||
.... 1110 RESERVED
|
|
||||||
.... 1111 RESERVED
|
|
||||||
|
|
||||||
|
|
||||||
Tag Type Interpretation of 60-bit payload
|
|
||||||
--- ------------- --------------------------------
|
|
||||||
0 Boolean 0 = False, 1 = True
|
|
||||||
1 IEEE 754 Offset to Buf holding little-endian 32/64-bit float
|
|
||||||
2 SignedInteger Signed 60-bit integer
|
|
||||||
3 SignedInteger Offset to Buf holding little-endian signed integer
|
|
||||||
4 String 0-7 bytes of UTF-8; length in lower 4 bits
|
|
||||||
5 String Offset to Buf holding UTF-8 data
|
|
||||||
6 ByteString 0-7 bytes of raw binary; length in lower 4 bits
|
|
||||||
7 ByteString Offset to Buf holding raw binary data
|
|
||||||
8 Symbol 0-7 bytes of UTF-8; length in lower 4 bits
|
|
||||||
9 Symbol Offset to Buf holding UTF-8 data
|
|
||||||
A Record Offset to Buf holding Refs (label, fields)
|
|
||||||
B Sequence Offset to Buf holding Refs (sequence values)
|
|
||||||
C Set Offset to Buf holding Refs (elements in arbitrary order)
|
|
||||||
D Dictionary Offset to Buf holding Refs (key/value pairs)
|
|
||||||
E Embedded Offset to Buf holding a single Ref
|
|
||||||
F - (reserved)
|
|
||||||
|
|
||||||
### Records, Sequences, Sets and Dictionaries.
|
### Records, Sequences, Sets and Dictionaries.
|
||||||
|
|
||||||
|
@ -147,50 +107,55 @@ Either way, the tag on the special `Ref` is the type of the encoded value.
|
||||||
n*8 8 Ref n-1
|
n*8 8 Ref n-1
|
||||||
(n+1)*8 8 Padding, only if n is even
|
(n+1)*8 8 Padding, only if n is even
|
||||||
|
|
||||||
Each compound datum is represented as a sequence of `Ref`s representing the
|
Each compound datum is represented as a `Buf` containing a sequence of
|
||||||
contained `Value`s. Each `Record`'s sequence represents the label, followed
|
`Ref`s representing the contained `Value`s. Each `Record`'s sequence
|
||||||
by the fields in order. Each `Sequence`'s representation is just its
|
represents the label, followed by the fields in order. Each `Sequence`'s
|
||||||
contained values in order. `Set`s are ordered arbitrarily into a sequence.
|
representation is just its contained values in order. `Set`s are ordered
|
||||||
The key-value pairs in a `Dictionary` are ordered arbitrarily, alternating
|
arbitrarily into a sequence. The key-value pairs in a `Dictionary` are
|
||||||
between keys and their matching values.
|
ordered arbitrarily, alternating between keys and their matching values.
|
||||||
|
|
||||||
There is *no* ordering requirement on the elements of `Set`s or the
|
There is *no* ordering requirement on the elements of `Set`s or the
|
||||||
key-value pairs in a `Dictionary`. They may appear in any order. However,
|
key-value pairs in a `Dictionary`. They may appear in any order. However,
|
||||||
the elements and keys *MUST* be pairwise distinct according to the
|
the elements and keys *MUST* be pairwise distinct according to the
|
||||||
[Preserves equivalence relation](preserves.html#equivalence).
|
[Preserves equivalence relation](preserves.html#equivalence).
|
||||||
|
|
||||||
|
Empty structures are represented using a `Ref` with a zero offset and the
|
||||||
|
appropriate tag.
|
||||||
|
|
||||||
### SignedIntegers.
|
### SignedIntegers.
|
||||||
|
|
||||||
Integers between -2<sup>59</sup> and 2<sup>59</sup>-1, inclusive, are
|
Integers between -2<sup>59</sup> and 2<sup>59</sup>-1, inclusive, are
|
||||||
represented as immediate values in a `Ref` with tag 2. Integers outside
|
represented as immediate values in a `Ref` with tag 3. Integers outside
|
||||||
this range are represented with a `Ref` with tag 3 pointing to a `Buf`
|
this range are represented with a `Ref` with tag 4 pointing to a `Buf`
|
||||||
containing exactly as many 64-bit words as needed to unambiguously identify
|
containing exactly as many 64-bit words as needed to unambiguously identify
|
||||||
the value and its sign, in little-endian byte and word ordering. Every
|
the value and its sign, in little-endian byte and word ordering. Every
|
||||||
`SignedInteger` *MUST* be represented with its shortest possible encoding.
|
`SignedInteger` *MUST* be represented with its shortest possible encoding.
|
||||||
|
Zero is represented using tag 3; use of tag 4 with a zero offset is
|
||||||
|
forbidden.
|
||||||
|
|
||||||
For example,
|
For example,
|
||||||
|
|
||||||
Number (decimal) Ref (64-bit) Buf (hex bytes)
|
Number (decimal) Ref (64-bit) Buf (hex bytes)
|
||||||
----------------------------------------- ---------------- ----------------
|
----------------------------------------- ---------------- ----------------
|
||||||
-576460752303423488 8000000000000002 -
|
-576460752303423488 8000000000000003 -
|
||||||
-257 FFFFFFFFFFFFEFF2 -
|
-257 FFFFFFFFFFFFEFF3 -
|
||||||
-1 FFFFFFFFFFFFFFF2 -
|
-1 FFFFFFFFFFFFFFF3 -
|
||||||
0 0000000000000002 -
|
0 0000000000000003 -
|
||||||
1 0000000000000012 -
|
1 0000000000000013 -
|
||||||
257 0000000000001012 -
|
257 0000000000001013 -
|
||||||
576460752303423487 7FFFFFFFFFFFFFF2 -
|
576460752303423487 7FFFFFFFFFFFFFF3 -
|
||||||
|
|
||||||
1000000000000000000000000000000 ...............3 1000000000000000
|
1000000000000000000000000000000 ...............4 1000000000000000
|
||||||
00000040EAED7446
|
00000040EAED7446
|
||||||
D09C2C9F0C000000
|
D09C2C9F0C000000
|
||||||
0000000000000000
|
0000000000000000
|
||||||
|
|
||||||
-1000000000000000000000000000000 ...............3 1000000000000000
|
-1000000000000000000000000000000 ...............4 1000000000000000
|
||||||
000000C015128BB9
|
000000C015128BB9
|
||||||
2F63D360F3FFFFFF
|
2F63D360F3FFFFFF
|
||||||
0000000000000000
|
0000000000000000
|
||||||
|
|
||||||
87112285931760246646623899502532662132736 ...............3 1800000000000000
|
87112285931760246646623899502532662132736 ...............4 1800000000000000
|
||||||
0000000000000000
|
0000000000000000
|
||||||
0000000000000000
|
0000000000000000
|
||||||
0001000000000000
|
0001000000000000
|
||||||
|
@ -202,27 +167,28 @@ Syntax for these three types varies only in the tag used. For `String` and
|
||||||
points, while for `ByteString` it is the raw data contained within the
|
points, while for `ByteString` it is the raw data contained within the
|
||||||
`Value` unmodified.
|
`Value` unmodified.
|
||||||
|
|
||||||
Encoded data of length 7 bytes or shorter is represented as an immediate
|
Encoded data of length between 1 and 7 bytes is represented as an immediate
|
||||||
`Ref` with tag 4 (`String`), 6 (`ByteString`) or 8 (`Symbol`). The lower 4
|
`Ref` where the low *five* bits are `00010` (`String`), `10001`
|
||||||
bits of the 60-bit payload are the length of the encoded data; the upper 56
|
(`ByteString`), or `10010` (`Symbol`). The upper three bits of the low byte
|
||||||
bits are 7 bytes of data, with the first data byte in the lowest byte, so
|
of the `Ref` give the length in bytes. The remaining bytes in the `Ref` are
|
||||||
that the order of data bytes in memory in an immediate encoding matches the
|
the data, in memory order.
|
||||||
order in a `Buf` encoding.
|
|
||||||
|
|
||||||
Data longer than 7 bytes is represented with a `Ref` with tag 5, 7 or 9
|
`Ref` tags 5, 6, and 7 are pointers to `String`, `ByteString` and `Symbol`
|
||||||
pointing to a `Buf` containing the bytes of encoded data. Empty values
|
`Buf`s, respectively. Offset zero signifies zero-length data; otherwise,
|
||||||
(length 0) *MUST* be encoded using pointer `Ref` form with special offset
|
the pointed-to `Buf` contains the bytes of encoded data.
|
||||||
zero.
|
|
||||||
|
Empty values (length 0) *MUST* be encoded using pointer `Ref` form with
|
||||||
|
special offset zero.
|
||||||
|
|
||||||
For example,
|
For example,
|
||||||
|
|
||||||
Value Ref (64-bit) Buf (hex bytes)
|
Value Ref (64-bit) Buf (hex bytes)
|
||||||
----------------------------------------- ---------------- ----------------
|
----------------------------------------- ---------------- ----------------
|
||||||
"" 0000000000000005 -
|
"" 0000000000000002 -
|
||||||
#"" 0000000000000007 -
|
#"" 0000000000000011 -
|
||||||
|| 0000000000000009 -
|
|| 0000000000000012 -
|
||||||
"Hello" 48656C6C6F000054 -
|
"Hello" 48656C6C6F0000A2 -
|
||||||
"a\0a" 6100610000000034 -
|
#"a\0a" 6100610000000071 -
|
||||||
|
|
||||||
"Hello, world!" ...............5 0D00000000000000
|
"Hello, world!" ...............5 0D00000000000000
|
||||||
48656C6C6F2C2077
|
48656C6C6F2C2077
|
||||||
|
@ -234,23 +200,27 @@ For example,
|
||||||
Value Ref (64-bit) Buf (hex bytes)
|
Value Ref (64-bit) Buf (hex bytes)
|
||||||
----------------------------------------- ---------------- ----------------
|
----------------------------------------- ---------------- ----------------
|
||||||
#f 0000000000000000 -
|
#f 0000000000000000 -
|
||||||
#t 0000000000000010 -
|
#t 0000000000000100 -
|
||||||
|
|
||||||
### Floats and Doubles.
|
### Floats and Doubles.
|
||||||
|
|
||||||
Each IEEE 754 4- and 8-byte binary representation is encoded into a `Buf`,
|
4-byte (32-bit) IEEE 754 `Float`s are encoded within immediate `Ref`s with
|
||||||
pointed to with a `Ref` with tag 1. The length of the `Buf` disambiguates
|
low byte equal to 0x81. The next four lowest bytes are the 4-byte,
|
||||||
between 32-bit floats and 64-bit doubles.
|
little-endian binary representation of the floating-point value, and the
|
||||||
|
upper three bytes of the `Ref` are unused.
|
||||||
|
|
||||||
((This is a very sparse encoding! Each float/double takes up 24 bytes split
|
8-byte (64-bit) IEEE 754 `Double`s are encoded into a `Buf`, pointed to by
|
||||||
across the `Buf` and `Ref`.))
|
a `Ref` with tag 13. The length of the `Buf` must be 8 bytes.
|
||||||
|
|
||||||
|
((This is a very sparse encoding for `Double`s! Each `Double` takes up 24
|
||||||
|
bytes split across the `Buf` and `Ref`.))
|
||||||
|
|
||||||
### Embeddeds.
|
### Embeddeds.
|
||||||
|
|
||||||
To encode an `Embedded`, first choose a `Value` to represent the denoted
|
To encode an `Embedded`, first choose a `Value` to represent the denoted
|
||||||
object, and encode that, producing a `Ref`. Place that ref in a `Buf` all
|
object, and encode that, producing a `Ref`. Place that ref in a `Buf` all
|
||||||
of its own (with length 8). Finally, point to the `Buf` with a `Ref` with
|
of its own (with length 8). Finally, point to the `Buf` with a `Ref` with
|
||||||
tag 15.
|
tag 12.
|
||||||
|
|
||||||
### Annotations.
|
### Annotations.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue