diff --git a/preserves-zerocopy.md b/preserves-zerocopy.md
index 5f0415c..c0716ee 100644
--- a/preserves-zerocopy.md
+++ b/preserves-zerocopy.md
@@ -66,76 +66,36 @@ Either way, the tag on the special `Ref` is the type of the encoded value.
### Tags and Refs.
- ................................................................
- Version 1
+The following table maps bit values in the low (leftmost) byte of a `Ref`
+to their interpretation. In interpretations including a three-bit `nnn`
+value, the `nnn` bits specify the length of the used portion of the
+remaining 56 bits of the `Ref`, counted in bytes, starting from the
+following byte, with value `000` disallowed.
- 00000000 IMM bool
- ...00100 IMM RESERVED
- nnn01000 IMM float nnn = length of payload in bytes. 000 disallowed
- nnn10000 IMM str
- nnn10100 IMM bytes
- nnn11000 IMM sym
+ Bit number Meaning
+ 7654 3210
+ --------- --- -------------------------------------------------------------
+ 0000 0000 IMM Boolean; next byte = 0 means false; 1 means true.
+ ...1 0000 IMM reserved
+ nnn0 0001 IMM Float: nnn must be 100, meaning a 32-bit IEEE754 value.
+ nnn1 0001 IMM ByteString
+ nnn0 0010 IMM String
+ nnn1 0010 IMM Symbol
- ....1100 IMM int
+ .... 0011 IMM SignedInteger between -2^59 and (2^59)-1, inclusive
- .....010 RESERVED
- ....0110 PTR embedded
- ....1110 PTR float
-
- ....0001 PTR str
- ....0101 PTR bytes
- ....1001 PTR sym
- ....1101 PTR int
-
- ....0011 PTR rec
- ....0111 PTR seq
- ....1011 PTR set
- ....1111 PTR map
-
- ................................................................
- Version 2
-
- 0000 0000 IMM bool
- ...1 0000 IMM RESERVED
- nnn0 0001 IMM float nnn = length of payload in bytes. 000 disallowed
- nnn1 0001 IMM bytes
- nnn0 0010 IMM str
- nnn1 0010 IMM sym
-
- .... 0011 IMM int
-
- .... 0100 PTR int
- .... 0101 PTR str
- .... 0110 PTR bytes
- .... 0111 PTR sym
- .... 1000 PTR rec
- .... 1001 PTR seq
- .... 1010 PTR set
- .... 1011 PTR map
- .... 1100 PTR embedded
- .... 1101 PTR float
- .... 1110 RESERVED
- .... 1111 RESERVED
-
-
- Tag Type Interpretation of 60-bit payload
- --- ------------- --------------------------------
- 0 Boolean 0 = False, 1 = True
- 1 IEEE 754 Offset to Buf holding little-endian 32/64-bit float
- 2 SignedInteger Signed 60-bit integer
- 3 SignedInteger Offset to Buf holding little-endian signed integer
- 4 String 0-7 bytes of UTF-8; length in lower 4 bits
- 5 String Offset to Buf holding UTF-8 data
- 6 ByteString 0-7 bytes of raw binary; length in lower 4 bits
- 7 ByteString Offset to Buf holding raw binary data
- 8 Symbol 0-7 bytes of UTF-8; length in lower 4 bits
- 9 Symbol Offset to Buf holding UTF-8 data
- A Record Offset to Buf holding Refs (label, fields)
- B Sequence Offset to Buf holding Refs (sequence values)
- C Set Offset to Buf holding Refs (elements in arbitrary order)
- D Dictionary Offset to Buf holding Refs (key/value pairs)
- E Embedded Offset to Buf holding a single Ref
- F - (reserved)
+ .... 0100 PTR SignedInteger outside the immediate range
+ .... 0101 PTR String
+ .... 0110 PTR ByteString
+ .... 0111 PTR Symbol
+ .... 1000 PTR Record
+ .... 1001 PTR Sequence
+ .... 1010 PTR Set
+ .... 1011 PTR Dictionary
+ .... 1100 PTR Embedded
+ .... 1101 PTR Double: length of pointed-to Buf must be 8
+ .... 1110 reserved
+ .... 1111 reserved
### Records, Sequences, Sets and Dictionaries.
@@ -147,50 +107,55 @@ Either way, the tag on the special `Ref` is the type of the encoded value.
n*8 8 Ref n-1
(n+1)*8 8 Padding, only if n is even
-Each compound datum is represented as a sequence of `Ref`s representing the
-contained `Value`s. Each `Record`'s sequence represents the label, followed
-by the fields in order. Each `Sequence`'s representation is just its
-contained values in order. `Set`s are ordered arbitrarily into a sequence.
-The key-value pairs in a `Dictionary` are ordered arbitrarily, alternating
-between keys and their matching values.
+Each compound datum is represented as a `Buf` containing a sequence of
+`Ref`s representing the contained `Value`s. Each `Record`'s sequence
+represents the label, followed by the fields in order. Each `Sequence`'s
+representation is just its contained values in order. `Set`s are ordered
+arbitrarily into a sequence. The key-value pairs in a `Dictionary` are
+ordered arbitrarily, alternating between keys and their matching values.
There is *no* ordering requirement on the elements of `Set`s or the
key-value pairs in a `Dictionary`. They may appear in any order. However,
the elements and keys *MUST* be pairwise distinct according to the
[Preserves equivalence relation](preserves.html#equivalence).
+Empty structures are represented using a `Ref` with a zero offset and the
+appropriate tag.
+
### SignedIntegers.
Integers between -259 and 259-1, inclusive, are
-represented as immediate values in a `Ref` with tag 2. Integers outside
-this range are represented with a `Ref` with tag 3 pointing to a `Buf`
+represented as immediate values in a `Ref` with tag 3. Integers outside
+this range are represented with a `Ref` with tag 4 pointing to a `Buf`
containing exactly as many 64-bit words as needed to unambiguously identify
the value and its sign, in little-endian byte and word ordering. Every
`SignedInteger` *MUST* be represented with its shortest possible encoding.
+Zero is represented using tag 3; use of tag 4 with a zero offset is
+forbidden.
For example,
Number (decimal) Ref (64-bit) Buf (hex bytes)
----------------------------------------- ---------------- ----------------
- -576460752303423488 8000000000000002 -
- -257 FFFFFFFFFFFFEFF2 -
- -1 FFFFFFFFFFFFFFF2 -
- 0 0000000000000002 -
- 1 0000000000000012 -
- 257 0000000000001012 -
- 576460752303423487 7FFFFFFFFFFFFFF2 -
+ -576460752303423488 8000000000000003 -
+ -257 FFFFFFFFFFFFEFF3 -
+ -1 FFFFFFFFFFFFFFF3 -
+ 0 0000000000000003 -
+ 1 0000000000000013 -
+ 257 0000000000001013 -
+ 576460752303423487 7FFFFFFFFFFFFFF3 -
- 1000000000000000000000000000000 ...............3 1000000000000000
+ 1000000000000000000000000000000 ...............4 1000000000000000
00000040EAED7446
D09C2C9F0C000000
0000000000000000
- -1000000000000000000000000000000 ...............3 1000000000000000
+ -1000000000000000000000000000000 ...............4 1000000000000000
000000C015128BB9
2F63D360F3FFFFFF
0000000000000000
- 87112285931760246646623899502532662132736 ...............3 1800000000000000
+ 87112285931760246646623899502532662132736 ...............4 1800000000000000
0000000000000000
0000000000000000
0001000000000000
@@ -202,27 +167,28 @@ Syntax for these three types varies only in the tag used. For `String` and
points, while for `ByteString` it is the raw data contained within the
`Value` unmodified.
-Encoded data of length 7 bytes or shorter is represented as an immediate
-`Ref` with tag 4 (`String`), 6 (`ByteString`) or 8 (`Symbol`). The lower 4
-bits of the 60-bit payload are the length of the encoded data; the upper 56
-bits are 7 bytes of data, with the first data byte in the lowest byte, so
-that the order of data bytes in memory in an immediate encoding matches the
-order in a `Buf` encoding.
+Encoded data of length between 1 and 7 bytes is represented as an immediate
+`Ref` where the low *five* bits are `00010` (`String`), `10001`
+(`ByteString`), or `10010` (`Symbol`). The upper three bits of the low byte
+of the `Ref` give the length in bytes. The remaining bytes in the `Ref` are
+the data, in memory order.
-Data longer than 7 bytes is represented with a `Ref` with tag 5, 7 or 9
-pointing to a `Buf` containing the bytes of encoded data. Empty values
-(length 0) *MUST* be encoded using pointer `Ref` form with special offset
-zero.
+`Ref` tags 5, 6, and 7 are pointers to `String`, `ByteString` and `Symbol`
+`Buf`s, respectively. Offset zero signifies zero-length data; otherwise,
+the pointed-to `Buf` contains the bytes of encoded data.
+
+Empty values (length 0) *MUST* be encoded using pointer `Ref` form with
+special offset zero.
For example,
Value Ref (64-bit) Buf (hex bytes)
----------------------------------------- ---------------- ----------------
- "" 0000000000000005 -
- #"" 0000000000000007 -
- || 0000000000000009 -
- "Hello" 48656C6C6F000054 -
- "a\0a" 6100610000000034 -
+ "" 0000000000000002 -
+ #"" 0000000000000011 -
+ || 0000000000000012 -
+ "Hello" 48656C6C6F0000A2 -
+ #"a\0a" 6100610000000071 -
"Hello, world!" ...............5 0D00000000000000
48656C6C6F2C2077
@@ -234,23 +200,27 @@ For example,
Value Ref (64-bit) Buf (hex bytes)
----------------------------------------- ---------------- ----------------
#f 0000000000000000 -
- #t 0000000000000010 -
+ #t 0000000000000100 -
### Floats and Doubles.
-Each IEEE 754 4- and 8-byte binary representation is encoded into a `Buf`,
-pointed to with a `Ref` with tag 1. The length of the `Buf` disambiguates
-between 32-bit floats and 64-bit doubles.
+4-byte (32-bit) IEEE 754 `Float`s are encoded within immediate `Ref`s with
+low byte equal to 0x81. The next four lowest bytes are the 4-byte,
+little-endian binary representation of the floating-point value, and the
+upper three bytes of the `Ref` are unused.
-((This is a very sparse encoding! Each float/double takes up 24 bytes split
-across the `Buf` and `Ref`.))
+8-byte (64-bit) IEEE 754 `Double`s are encoded into a `Buf`, pointed to by
+a `Ref` with tag 13. The length of the `Buf` must be 8 bytes.
+
+((This is a very sparse encoding for `Double`s! Each `Double` takes up 24
+bytes split across the `Buf` and `Ref`.))
### Embeddeds.
To encode an `Embedded`, first choose a `Value` to represent the denoted
object, and encode that, producing a `Ref`. Place that ref in a `Buf` all
of its own (with length 8). Finally, point to the `Buf` with a `Ref` with
-tag 15.
+tag 12.
### Annotations.