From a22ded2f16271f35e963dc1a2b0f79071f216f42 Mon Sep 17 00:00:00 2001 From: Tony Garnock-Jones Date: Mon, 24 Sep 2018 14:09:26 +0100 Subject: [PATCH] Literal small integers --- syndicate/mc/preserve.md | 67 ++++++++++++++++++++++++---------------- 1 file changed, 40 insertions(+), 27 deletions(-) diff --git a/syndicate/mc/preserve.md b/syndicate/mc/preserve.md index 7142b8d..d491541 100644 --- a/syndicate/mc/preserve.md +++ b/syndicate/mc/preserve.md @@ -105,8 +105,9 @@ follows:[^ordering-by-syntax] < String < ByteString < Symbol [^ordering-by-syntax]: The observant reader may note that the - ordering here is the same as that implied by the tagging scheme - used in the concrete binary syntax for `Value`s. + ordering here is (almost) the same as that implied by the tagging + scheme used in the concrete binary syntax for `Value`s. (The + exception is the syntax for small integers near zero.) **Equivalence.** Two `Value`s are equal if neither is less than the other according to the total order. @@ -497,21 +498,27 @@ Note that `header(3,3,m)` and `open(3,3)`/`close(3,3)` is unused and reserved. #### SignedInteger -Format B (known length): +Format B/A (known length/fixed-size): - [[ x ]] when x ∈ SignedInteger = header(1,0,m) ++ intbytes(x) + [[ x ]] when x ∈ SignedInteger = header(1,0,m) ++ intbytes(x) if x<-3 ∨ 13≤x + header(0,1,x+16) if -3≤x<0 + header(0,1,x) if 0≤x<13 + +Integers in the range [-3,12] are compactly represented using format A +because they are so frequently used. Other integers are represented +using format B. Format C *MUST NOT* be used for `SignedInteger`s. The function `intbytes(x)` gives the big-endian two's-complement binary representation of `x`, taking exactly as many whole bytes as needed to unambiguously identify the value and its sign, and `m = -|intbytes(x)|`. +|intbytes(x)|`. The most-significant bit in the first byte in +`intbytes(x)` is the sign bit.[^zero-intbytes] -The value 0 needs zero bytes to identify the value, so `intbytes(0)` -is the empty byte string. Non-zero values need at least one byte; the -most-significant bit in the first byte in `intbytes(x)` for `x`≠0 is -the sign bit. + [^zero-intbytes]: The value 0 needs zero bytes to identify the + value, so `intbytes(0)` is the empty byte string. Non-zero values + need at least one byte. For example, @@ -522,10 +529,14 @@ For example, [[ -129 ]] = [0x42, 0xFF, 0x7F] [[ -128 ]] = [0x41, 0x80] [[ -127 ]] = [0x41, 0x81] - [[ -2 ]] = [0x41, 0xFE] - [[ -1 ]] = [0x41, 0xFF] - [[ 0 ]] = [0x40] - [[ 1 ]] = [0x41, 0x01] + [[ -4 ]] = [0x41, 0xFC] + [[ -3 ]] = [0x1D] + [[ -2 ]] = [0x1E] + [[ -1 ]] = [0x1F] + [[ 0 ]] = [0x10] + [[ 1 ]] = [0x11] + [[ 12 ]] = [0x1C] + [[ 13 ]] = [0x41, 0x0D] [[ 127 ]] = [0x41, 0x7F] [[ 128 ]] = [0x42, 0x00, 0x80] [[ 255 ]] = [0x42, 0x00, 0xFF] @@ -593,17 +604,17 @@ short form label number 0 to label `discard`, 1 to `capture`, and 2 to |--------------------------------------------------------------------|----------------------------------------------------| | `(capture (discard))` | 91 80 | | `(observe (speak (discard) (capture (discard))))` | A1 B3 75 73 70 65 61 6B 80 91 80 | -| `[1 2 3 4]` (format B) | C4 41 01 41 02 41 03 41 04 | -| `[1 2 3 4]` (format C) | 2C 41 01 41 02 41 03 41 04 3C | -| `[-2 -1 0 1]` | C4 41 FE 41 FF 40 41 01 | +| `[1 2 3 4]` (format B) | C4 11 12 13 14 | +| `[1 2 3 4]` (format C) | 2C 11 12 13 14 3C | +| `[-2 -1 0 1]` | C4 1E 1F 40 11 | | `"hello"` (format B) | 55 68 65 6C 6C 6F | | `"hello"` (format C, 2 chunks) | 25 52 68 65 53 6C 6C 6F 35 | | `"hello"` (format C, 5 chunks) | 25 52 68 65 52 6C 6C 50 50 51 6F 35 | | `["hello" there #"world" [] #set{} #t #f]` | C7 55 68 65 6C 6C 6F 75 74 68 65 72 65 C0 D0 01 00 | | `-257` | 42 FE FF | -| `-1` | 41 FF | -| `0` | 40 | -| `1` | 41 01 | +| `-1` | 1F | +| `0` | 10 | +| `1` | 11 | | `255` | 42 00 FF | | `1f` | 02 3F 80 00 00 | | `1d` | 03 3F F0 00 00 00 00 00 00 | @@ -623,16 +634,16 @@ encodes to C5 ;; Sequence, 5 76 74 69 74 6C 65 64 ;; Symbol, "titled" 76 70 65 72 73 6F 6E ;; Symbol, "person" - 41 02 ;; SignedInteger, "2" + 12 ;; SignedInteger, "2" 75 74 68 69 6E 67 ;; Symbol, "thing" - 41 01 ;; SignedInteger, "1" + 11 ;; SignedInteger, "1" 41 65 ;; SignedInteger, "101" 59 42 6C 61 63 6B 77 65 6C 6C ;; String, "Blackwell" B4 ;; Record, generic, 3+1 74 64 61 74 65 ;; Symbol, "date" 42 07 1D ;; SignedInteger, "1821" - 41 02 ;; SignedInteger, "2" - 41 03 ;; SignedInteger, "3" + 12 ;; SignedInteger, "2" + 13 ;; SignedInteger, "3" 52 44 72 ;; String, "Dr" [^extensibility2]: It happens to line up with Racket's @@ -800,7 +811,7 @@ the same `Value` may yield different binary `Repr`s. 02 - Float 03 - Double (0x) RESERVED 04-0F - (1x) RESERVED 10-1F + 1x - Small integers 0..12,-3..-1 2x - Start Stream 3x - End Stream @@ -829,6 +840,8 @@ the same `Value` may yield different binary `Repr`s. 00 00 0010 Float, 32 bits big-endian binary 00 00 0011 Double, 64 bits big-endian binary + 00 01 xxxx Small integers 0..12,-3..-1 + 00 10 ttnn Start Stream When tt = 00 --> error 01 --> each chunk is a piece @@ -852,8 +865,6 @@ the same `Value` may yield different binary `Repr`s. If mmmm = 1111, a varint(m) follows, giving the length, before the body; otherwise, m is the length of the body to follow. - - ## Appendix. Representing Values in Programming Languages We have given a definition of `Value` and its semantics, and proposed @@ -1082,6 +1093,8 @@ what? Some domain-specific base URI? Q. Are the language mappings reasonable? How about one for Python? -Q. Literal small integers: could be nice? Not absolutely necessary. +Q. Literal small integers: are they pulling their weight? They're not +absolutely necessary. They mess up the connection between +value-ordering and repr-ordering! ## Notes