diff --git a/preserves-zerocopy.md b/preserves-zerocopy.md index 5f0415c..c0716ee 100644 --- a/preserves-zerocopy.md +++ b/preserves-zerocopy.md @@ -66,76 +66,36 @@ Either way, the tag on the special `Ref` is the type of the encoded value. ### Tags and Refs. - ................................................................ - Version 1 +The following table maps bit values in the low (leftmost) byte of a `Ref` +to their interpretation. In interpretations including a three-bit `nnn` +value, the `nnn` bits specify the length of the used portion of the +remaining 56 bits of the `Ref`, counted in bytes, starting from the +following byte, with value `000` disallowed. - 00000000 IMM bool - ...00100 IMM RESERVED - nnn01000 IMM float nnn = length of payload in bytes. 000 disallowed - nnn10000 IMM str - nnn10100 IMM bytes - nnn11000 IMM sym + Bit number Meaning + 7654 3210 + --------- --- ------------------------------------------------------------- + 0000 0000 IMM Boolean; next byte = 0 means false; 1 means true. + ...1 0000 IMM reserved + nnn0 0001 IMM Float: nnn must be 100, meaning a 32-bit IEEE754 value. + nnn1 0001 IMM ByteString + nnn0 0010 IMM String + nnn1 0010 IMM Symbol - ....1100 IMM int + .... 0011 IMM SignedInteger between -2^59 and (2^59)-1, inclusive - .....010 RESERVED - ....0110 PTR embedded - ....1110 PTR float - - ....0001 PTR str - ....0101 PTR bytes - ....1001 PTR sym - ....1101 PTR int - - ....0011 PTR rec - ....0111 PTR seq - ....1011 PTR set - ....1111 PTR map - - ................................................................ - Version 2 - - 0000 0000 IMM bool - ...1 0000 IMM RESERVED - nnn0 0001 IMM float nnn = length of payload in bytes. 000 disallowed - nnn1 0001 IMM bytes - nnn0 0010 IMM str - nnn1 0010 IMM sym - - .... 0011 IMM int - - .... 0100 PTR int - .... 0101 PTR str - .... 0110 PTR bytes - .... 0111 PTR sym - .... 1000 PTR rec - .... 1001 PTR seq - .... 1010 PTR set - .... 1011 PTR map - .... 1100 PTR embedded - .... 1101 PTR float - .... 1110 RESERVED - .... 1111 RESERVED - - - Tag Type Interpretation of 60-bit payload - --- ------------- -------------------------------- - 0 Boolean 0 = False, 1 = True - 1 IEEE 754 Offset to Buf holding little-endian 32/64-bit float - 2 SignedInteger Signed 60-bit integer - 3 SignedInteger Offset to Buf holding little-endian signed integer - 4 String 0-7 bytes of UTF-8; length in lower 4 bits - 5 String Offset to Buf holding UTF-8 data - 6 ByteString 0-7 bytes of raw binary; length in lower 4 bits - 7 ByteString Offset to Buf holding raw binary data - 8 Symbol 0-7 bytes of UTF-8; length in lower 4 bits - 9 Symbol Offset to Buf holding UTF-8 data - A Record Offset to Buf holding Refs (label, fields) - B Sequence Offset to Buf holding Refs (sequence values) - C Set Offset to Buf holding Refs (elements in arbitrary order) - D Dictionary Offset to Buf holding Refs (key/value pairs) - E Embedded Offset to Buf holding a single Ref - F - (reserved) + .... 0100 PTR SignedInteger outside the immediate range + .... 0101 PTR String + .... 0110 PTR ByteString + .... 0111 PTR Symbol + .... 1000 PTR Record + .... 1001 PTR Sequence + .... 1010 PTR Set + .... 1011 PTR Dictionary + .... 1100 PTR Embedded + .... 1101 PTR Double: length of pointed-to Buf must be 8 + .... 1110 reserved + .... 1111 reserved ### Records, Sequences, Sets and Dictionaries. @@ -147,50 +107,55 @@ Either way, the tag on the special `Ref` is the type of the encoded value. n*8 8 Ref n-1 (n+1)*8 8 Padding, only if n is even -Each compound datum is represented as a sequence of `Ref`s representing the -contained `Value`s. Each `Record`'s sequence represents the label, followed -by the fields in order. Each `Sequence`'s representation is just its -contained values in order. `Set`s are ordered arbitrarily into a sequence. -The key-value pairs in a `Dictionary` are ordered arbitrarily, alternating -between keys and their matching values. +Each compound datum is represented as a `Buf` containing a sequence of +`Ref`s representing the contained `Value`s. Each `Record`'s sequence +represents the label, followed by the fields in order. Each `Sequence`'s +representation is just its contained values in order. `Set`s are ordered +arbitrarily into a sequence. The key-value pairs in a `Dictionary` are +ordered arbitrarily, alternating between keys and their matching values. There is *no* ordering requirement on the elements of `Set`s or the key-value pairs in a `Dictionary`. They may appear in any order. However, the elements and keys *MUST* be pairwise distinct according to the [Preserves equivalence relation](preserves.html#equivalence). +Empty structures are represented using a `Ref` with a zero offset and the +appropriate tag. + ### SignedIntegers. Integers between -259 and 259-1, inclusive, are -represented as immediate values in a `Ref` with tag 2. Integers outside -this range are represented with a `Ref` with tag 3 pointing to a `Buf` +represented as immediate values in a `Ref` with tag 3. Integers outside +this range are represented with a `Ref` with tag 4 pointing to a `Buf` containing exactly as many 64-bit words as needed to unambiguously identify the value and its sign, in little-endian byte and word ordering. Every `SignedInteger` *MUST* be represented with its shortest possible encoding. +Zero is represented using tag 3; use of tag 4 with a zero offset is +forbidden. For example, Number (decimal) Ref (64-bit) Buf (hex bytes) ----------------------------------------- ---------------- ---------------- - -576460752303423488 8000000000000002 - - -257 FFFFFFFFFFFFEFF2 - - -1 FFFFFFFFFFFFFFF2 - - 0 0000000000000002 - - 1 0000000000000012 - - 257 0000000000001012 - - 576460752303423487 7FFFFFFFFFFFFFF2 - + -576460752303423488 8000000000000003 - + -257 FFFFFFFFFFFFEFF3 - + -1 FFFFFFFFFFFFFFF3 - + 0 0000000000000003 - + 1 0000000000000013 - + 257 0000000000001013 - + 576460752303423487 7FFFFFFFFFFFFFF3 - - 1000000000000000000000000000000 ...............3 1000000000000000 + 1000000000000000000000000000000 ...............4 1000000000000000 00000040EAED7446 D09C2C9F0C000000 0000000000000000 - -1000000000000000000000000000000 ...............3 1000000000000000 + -1000000000000000000000000000000 ...............4 1000000000000000 000000C015128BB9 2F63D360F3FFFFFF 0000000000000000 - 87112285931760246646623899502532662132736 ...............3 1800000000000000 + 87112285931760246646623899502532662132736 ...............4 1800000000000000 0000000000000000 0000000000000000 0001000000000000 @@ -202,27 +167,28 @@ Syntax for these three types varies only in the tag used. For `String` and points, while for `ByteString` it is the raw data contained within the `Value` unmodified. -Encoded data of length 7 bytes or shorter is represented as an immediate -`Ref` with tag 4 (`String`), 6 (`ByteString`) or 8 (`Symbol`). The lower 4 -bits of the 60-bit payload are the length of the encoded data; the upper 56 -bits are 7 bytes of data, with the first data byte in the lowest byte, so -that the order of data bytes in memory in an immediate encoding matches the -order in a `Buf` encoding. +Encoded data of length between 1 and 7 bytes is represented as an immediate +`Ref` where the low *five* bits are `00010` (`String`), `10001` +(`ByteString`), or `10010` (`Symbol`). The upper three bits of the low byte +of the `Ref` give the length in bytes. The remaining bytes in the `Ref` are +the data, in memory order. -Data longer than 7 bytes is represented with a `Ref` with tag 5, 7 or 9 -pointing to a `Buf` containing the bytes of encoded data. Empty values -(length 0) *MUST* be encoded using pointer `Ref` form with special offset -zero. +`Ref` tags 5, 6, and 7 are pointers to `String`, `ByteString` and `Symbol` +`Buf`s, respectively. Offset zero signifies zero-length data; otherwise, +the pointed-to `Buf` contains the bytes of encoded data. + +Empty values (length 0) *MUST* be encoded using pointer `Ref` form with +special offset zero. For example, Value Ref (64-bit) Buf (hex bytes) ----------------------------------------- ---------------- ---------------- - "" 0000000000000005 - - #"" 0000000000000007 - - || 0000000000000009 - - "Hello" 48656C6C6F000054 - - "a\0a" 6100610000000034 - + "" 0000000000000002 - + #"" 0000000000000011 - + || 0000000000000012 - + "Hello" 48656C6C6F0000A2 - + #"a\0a" 6100610000000071 - "Hello, world!" ...............5 0D00000000000000 48656C6C6F2C2077 @@ -234,23 +200,27 @@ For example, Value Ref (64-bit) Buf (hex bytes) ----------------------------------------- ---------------- ---------------- #f 0000000000000000 - - #t 0000000000000010 - + #t 0000000000000100 - ### Floats and Doubles. -Each IEEE 754 4- and 8-byte binary representation is encoded into a `Buf`, -pointed to with a `Ref` with tag 1. The length of the `Buf` disambiguates -between 32-bit floats and 64-bit doubles. +4-byte (32-bit) IEEE 754 `Float`s are encoded within immediate `Ref`s with +low byte equal to 0x81. The next four lowest bytes are the 4-byte, +little-endian binary representation of the floating-point value, and the +upper three bytes of the `Ref` are unused. -((This is a very sparse encoding! Each float/double takes up 24 bytes split -across the `Buf` and `Ref`.)) +8-byte (64-bit) IEEE 754 `Double`s are encoded into a `Buf`, pointed to by +a `Ref` with tag 13. The length of the `Buf` must be 8 bytes. + +((This is a very sparse encoding for `Double`s! Each `Double` takes up 24 +bytes split across the `Buf` and `Ref`.)) ### Embeddeds. To encode an `Embedded`, first choose a `Value` to represent the denoted object, and encode that, producing a `Ref`. Place that ref in a `Buf` all of its own (with length 8). Finally, point to the `Buf` with a `Ref` with -tag 15. +tag 12. ### Annotations.