Literal small integers

2018-09-24 14:09:26 +01:00 · 2018-09-24 14:09:26 +01:00 · a22ded2f16
parent b4d4092b90
commit a22ded2f16
1 changed files with 40 additions and 27 deletions
--- a/syndicate/mc/preserve.md
+++ b/syndicate/mc/preserve.md
@ -105,8 +105,9 @@ follows:[^ordering-by-syntax]
                              < String < ByteString < Symbol

  [^ordering-by-syntax]: The observant reader may note that the
-    ordering here is the same as that implied by the tagging scheme
-    used in the concrete binary syntax for `Value`s.
+    ordering here is (almost) the same as that implied by the tagging
+    scheme used in the concrete binary syntax for `Value`s. (The
+    exception is the syntax for small integers near zero.)

 **Equivalence.**<a name="equivalence"></a> Two `Value`s are equal if
 neither is less than the other according to the total order.
@ -497,21 +498,27 @@ Note that `header(3,3,m)` and `open(3,3)`/`close(3,3)` is unused and reserved.

 #### SignedInteger

-Format B (known length):
+Format B/A (known length/fixed-size):

-    [[ x ]] when x ∈ SignedInteger = header(1,0,m) ++ intbytes(x)
+    [[ x ]] when x ∈ SignedInteger = header(1,0,m) ++ intbytes(x)  if x<-3 ∨ 13≤x
+                                     header(0,1,x+16)              if -3≤x<0
+                                     header(0,1,x)                 if 0≤x<13
+
+Integers in the range [-3,12] are compactly represented using format A
+because they are so frequently used. Other integers are represented
+using format B.

 Format C *MUST NOT* be used for `SignedInteger`s.

 The function `intbytes(x)` gives the big-endian two's-complement
 binary representation of `x`, taking exactly as many whole bytes as
 needed to unambiguously identify the value and its sign, and `m =
-|intbytes(x)|`.
+|intbytes(x)|`. The most-significant bit in the first byte in
+`intbytes(x)` <!-- for `x`≠0 --> is the sign bit.[^zero-intbytes]

-The value 0 needs zero bytes to identify the value, so `intbytes(0)`
-is the empty byte string. Non-zero values need at least one byte; the
-most-significant bit in the first byte in `intbytes(x)` for `x`≠0 is
-the sign bit.
+  [^zero-intbytes]: The value 0 needs zero bytes to identify the
+    value, so `intbytes(0)` is the empty byte string. Non-zero values
+    need at least one byte.

 For example,

@ -522,10 +529,14 @@ For example,
    [[   -129 ]] = [0x42, 0xFF, 0x7F]
    [[   -128 ]] = [0x41, 0x80]
    [[   -127 ]] = [0x41, 0x81]
-    [[     -2 ]] = [0x41, 0xFE]
-    [[     -1 ]] = [0x41, 0xFF]
-    [[      0 ]] = [0x40]
-    [[      1 ]] = [0x41, 0x01]
+    [[     -4 ]] = [0x41, 0xFC]
+    [[     -3 ]] = [0x1D]
+    [[     -2 ]] = [0x1E]
+    [[     -1 ]] = [0x1F]
+    [[      0 ]] = [0x10]
+    [[      1 ]] = [0x11]
+    [[     12 ]] = [0x1C]
+    [[     13 ]] = [0x41, 0x0D]
    [[    127 ]] = [0x41, 0x7F]
    [[    128 ]] = [0x42, 0x00, 0x80]
    [[    255 ]] = [0x42, 0x00, 0xFF]
@ -593,17 +604,17 @@ short form label number 0 to label `discard`, 1 to `capture`, and 2 to
 |--------------------------------------------------------------------|----------------------------------------------------|
 | `(capture (discard))`                                              | 91 80                                              |
 | `(observe (speak (discard) (capture (discard))))`                  | A1 B3 75 73 70 65 61 6B 80 91 80                   |
-| `[1 2 3 4]` (format B)                                             | C4 41 01 41 02 41 03 41 04                         |
-| `[1 2 3 4]` (format C)                                             | 2C 41 01 41 02 41 03 41 04 3C                      |
-| `[-2 -1 0 1]`                                                      | C4 41 FE 41 FF 40 41 01                            |
+| `[1 2 3 4]` (format B)                                             | C4 11 12 13 14                                     |
+| `[1 2 3 4]` (format C)                                             | 2C 11 12 13 14 3C                                  |
+| `[-2 -1 0 1]`                                                      | C4 1E 1F 40 11                                     |
 | `"hello"` (format B)                                               | 55 68 65 6C 6C 6F                                  |
 | `"hello"` (format C, 2 chunks)                                     | 25 52 68 65 53 6C 6C 6F 35                         |
 | `"hello"` (format C, 5 chunks)                                     | 25 52 68 65 52 6C 6C 50 50 51 6F 35                |
 | `["hello" there #"world" [] #set{} #t #f]`                         | C7 55 68 65 6C 6C 6F 75 74 68 65 72 65 C0 D0 01 00 |
 | `-257`                                                             | 42 FE FF                                           |
-| `-1`                                                               | 41 FF                                              |
-| `0`                                                                | 40                                                 |
-| `1`                                                                | 41 01                                              |
+| `-1`                                                               | 1F                                                 |
+| `0`                                                                | 10                                                 |
+| `1`                                                                | 11                                                 |
 | `255`                                                              | 42 00 FF                                           |
 | `1f`                                                               | 02 3F 80 00 00                                     |
 | `1d`                                                               | 03 3F F0 00 00 00 00 00 00                         |
@ -623,16 +634,16 @@ encodes to
      C5                              ;; Sequence, 5
        76 74 69 74 6C 65 64            ;; Symbol, "titled"
        76 70 65 72 73 6F 6E            ;; Symbol, "person"
-        41 02                           ;; SignedInteger, "2"
+        12                              ;; SignedInteger, "2"
        75 74 68 69 6E 67               ;; Symbol, "thing"
-        41 01                           ;; SignedInteger, "1"
+        11                              ;; SignedInteger, "1"
      41 65                           ;; SignedInteger, "101"
      59 42 6C 61 63 6B 77 65 6C 6C   ;; String, "Blackwell"
      B4                              ;; Record, generic, 3+1
        74 64 61 74 65                  ;; Symbol, "date"
        42 07 1D                        ;; SignedInteger, "1821"
-        41 02                           ;; SignedInteger, "2"
-        41 03                           ;; SignedInteger, "3"
+        12                              ;; SignedInteger, "2"
+        13                              ;; SignedInteger, "3"
      52 44 72                        ;; String, "Dr"

  [^extensibility2]: It happens to line up with Racket's
@ -800,7 +811,7 @@ the same `Value` may yield different binary `Repr`s.
     02 - Float
     03 - Double
    (0x)  RESERVED 04-0F
-    (1x)  RESERVED 10-1F
+     1x - Small integers 0..12,-3..-1
     2x - Start Stream
     3x - End Stream

@ -829,6 +840,8 @@ the same `Value` may yield different binary `Repr`s.
     00 00 0010  Float, 32 bits big-endian binary
     00 00 0011  Double, 64 bits big-endian binary

+     00 01 xxxx  Small integers 0..12,-3..-1
+
     00 10 ttnn  Start Stream <tt,nn>
                   When tt = 00 --> error
                             01 --> each chunk is a <tt,nn> piece
@ -852,8 +865,6 @@ the same `Value` may yield different binary `Repr`s.
     If mmmm = 1111, a varint(m) follows, giving the length, before
     the body; otherwise, m is the length of the body to follow.

-
-
 ## Appendix. Representing Values in Programming Languages

 We have given a definition of `Value` and its semantics, and proposed
@ -1082,6 +1093,8 @@ what? Some domain-specific base URI?

 Q. Are the language mappings reasonable? How about one for Python?

-Q. Literal small integers: could be nice? Not absolutely necessary.
+Q. Literal small integers: are they pulling their weight? They're not
+absolutely necessary. They mess up the connection between
+value-ordering and repr-ordering!

 ## Notes