Literal small integers

This commit is contained in:
Tony Garnock-Jones 2018-09-24 14:09:26 +01:00
parent b4d4092b90
commit a22ded2f16
1 changed files with 40 additions and 27 deletions

View File

@ -105,8 +105,9 @@ follows:[^ordering-by-syntax]
< String < ByteString < Symbol < String < ByteString < Symbol
[^ordering-by-syntax]: The observant reader may note that the [^ordering-by-syntax]: The observant reader may note that the
ordering here is the same as that implied by the tagging scheme ordering here is (almost) the same as that implied by the tagging
used in the concrete binary syntax for `Value`s. scheme used in the concrete binary syntax for `Value`s. (The
exception is the syntax for small integers near zero.)
**Equivalence.**<a name="equivalence"></a> Two `Value`s are equal if **Equivalence.**<a name="equivalence"></a> Two `Value`s are equal if
neither is less than the other according to the total order. neither is less than the other according to the total order.
@ -497,21 +498,27 @@ Note that `header(3,3,m)` and `open(3,3)`/`close(3,3)` is unused and reserved.
#### SignedInteger #### SignedInteger
Format B (known length): Format B/A (known length/fixed-size):
[[ x ]] when x ∈ SignedInteger = header(1,0,m) ++ intbytes(x) [[ x ]] when x ∈ SignedInteger = header(1,0,m) ++ intbytes(x) if x<-3 13x
header(0,1,x+16) if -3≤x<0
header(0,1,x) if 0≤x<13
Integers in the range [-3,12] are compactly represented using format A
because they are so frequently used. Other integers are represented
using format B.
Format C *MUST NOT* be used for `SignedInteger`s. Format C *MUST NOT* be used for `SignedInteger`s.
The function `intbytes(x)` gives the big-endian two's-complement The function `intbytes(x)` gives the big-endian two's-complement
binary representation of `x`, taking exactly as many whole bytes as binary representation of `x`, taking exactly as many whole bytes as
needed to unambiguously identify the value and its sign, and `m = needed to unambiguously identify the value and its sign, and `m =
|intbytes(x)|`. |intbytes(x)|`. The most-significant bit in the first byte in
`intbytes(x)` <!-- for `x`≠0 --> is the sign bit.[^zero-intbytes]
The value 0 needs zero bytes to identify the value, so `intbytes(0)` [^zero-intbytes]: The value 0 needs zero bytes to identify the
is the empty byte string. Non-zero values need at least one byte; the value, so `intbytes(0)` is the empty byte string. Non-zero values
most-significant bit in the first byte in `intbytes(x)` for `x`≠0 is need at least one byte.
the sign bit.
For example, For example,
@ -522,10 +529,14 @@ For example,
[[ -129 ]] = [0x42, 0xFF, 0x7F] [[ -129 ]] = [0x42, 0xFF, 0x7F]
[[ -128 ]] = [0x41, 0x80] [[ -128 ]] = [0x41, 0x80]
[[ -127 ]] = [0x41, 0x81] [[ -127 ]] = [0x41, 0x81]
[[ -2 ]] = [0x41, 0xFE] [[ -4 ]] = [0x41, 0xFC]
[[ -1 ]] = [0x41, 0xFF] [[ -3 ]] = [0x1D]
[[ 0 ]] = [0x40] [[ -2 ]] = [0x1E]
[[ 1 ]] = [0x41, 0x01] [[ -1 ]] = [0x1F]
[[ 0 ]] = [0x10]
[[ 1 ]] = [0x11]
[[ 12 ]] = [0x1C]
[[ 13 ]] = [0x41, 0x0D]
[[ 127 ]] = [0x41, 0x7F] [[ 127 ]] = [0x41, 0x7F]
[[ 128 ]] = [0x42, 0x00, 0x80] [[ 128 ]] = [0x42, 0x00, 0x80]
[[ 255 ]] = [0x42, 0x00, 0xFF] [[ 255 ]] = [0x42, 0x00, 0xFF]
@ -593,17 +604,17 @@ short form label number 0 to label `discard`, 1 to `capture`, and 2 to
|--------------------------------------------------------------------|----------------------------------------------------| |--------------------------------------------------------------------|----------------------------------------------------|
| `(capture (discard))` | 91 80 | | `(capture (discard))` | 91 80 |
| `(observe (speak (discard) (capture (discard))))` | A1 B3 75 73 70 65 61 6B 80 91 80 | | `(observe (speak (discard) (capture (discard))))` | A1 B3 75 73 70 65 61 6B 80 91 80 |
| `[1 2 3 4]` (format B) | C4 41 01 41 02 41 03 41 04 | | `[1 2 3 4]` (format B) | C4 11 12 13 14 |
| `[1 2 3 4]` (format C) | 2C 41 01 41 02 41 03 41 04 3C | | `[1 2 3 4]` (format C) | 2C 11 12 13 14 3C |
| `[-2 -1 0 1]` | C4 41 FE 41 FF 40 41 01 | | `[-2 -1 0 1]` | C4 1E 1F 40 11 |
| `"hello"` (format B) | 55 68 65 6C 6C 6F | | `"hello"` (format B) | 55 68 65 6C 6C 6F |
| `"hello"` (format C, 2 chunks) | 25 52 68 65 53 6C 6C 6F 35 | | `"hello"` (format C, 2 chunks) | 25 52 68 65 53 6C 6C 6F 35 |
| `"hello"` (format C, 5 chunks) | 25 52 68 65 52 6C 6C 50 50 51 6F 35 | | `"hello"` (format C, 5 chunks) | 25 52 68 65 52 6C 6C 50 50 51 6F 35 |
| `["hello" there #"world" [] #set{} #t #f]` | C7 55 68 65 6C 6C 6F 75 74 68 65 72 65 C0 D0 01 00 | | `["hello" there #"world" [] #set{} #t #f]` | C7 55 68 65 6C 6C 6F 75 74 68 65 72 65 C0 D0 01 00 |
| `-257` | 42 FE FF | | `-257` | 42 FE FF |
| `-1` | 41 FF | | `-1` | 1F |
| `0` | 40 | | `0` | 10 |
| `1` | 41 01 | | `1` | 11 |
| `255` | 42 00 FF | | `255` | 42 00 FF |
| `1f` | 02 3F 80 00 00 | | `1f` | 02 3F 80 00 00 |
| `1d` | 03 3F F0 00 00 00 00 00 00 | | `1d` | 03 3F F0 00 00 00 00 00 00 |
@ -623,16 +634,16 @@ encodes to
C5 ;; Sequence, 5 C5 ;; Sequence, 5
76 74 69 74 6C 65 64 ;; Symbol, "titled" 76 74 69 74 6C 65 64 ;; Symbol, "titled"
76 70 65 72 73 6F 6E ;; Symbol, "person" 76 70 65 72 73 6F 6E ;; Symbol, "person"
41 02 ;; SignedInteger, "2" 12 ;; SignedInteger, "2"
75 74 68 69 6E 67 ;; Symbol, "thing" 75 74 68 69 6E 67 ;; Symbol, "thing"
41 01 ;; SignedInteger, "1" 11 ;; SignedInteger, "1"
41 65 ;; SignedInteger, "101" 41 65 ;; SignedInteger, "101"
59 42 6C 61 63 6B 77 65 6C 6C ;; String, "Blackwell" 59 42 6C 61 63 6B 77 65 6C 6C ;; String, "Blackwell"
B4 ;; Record, generic, 3+1 B4 ;; Record, generic, 3+1
74 64 61 74 65 ;; Symbol, "date" 74 64 61 74 65 ;; Symbol, "date"
42 07 1D ;; SignedInteger, "1821" 42 07 1D ;; SignedInteger, "1821"
41 02 ;; SignedInteger, "2" 12 ;; SignedInteger, "2"
41 03 ;; SignedInteger, "3" 13 ;; SignedInteger, "3"
52 44 72 ;; String, "Dr" 52 44 72 ;; String, "Dr"
[^extensibility2]: It happens to line up with Racket's [^extensibility2]: It happens to line up with Racket's
@ -800,7 +811,7 @@ the same `Value` may yield different binary `Repr`s.
02 - Float 02 - Float
03 - Double 03 - Double
(0x) RESERVED 04-0F (0x) RESERVED 04-0F
(1x) RESERVED 10-1F 1x - Small integers 0..12,-3..-1
2x - Start Stream 2x - Start Stream
3x - End Stream 3x - End Stream
@ -829,6 +840,8 @@ the same `Value` may yield different binary `Repr`s.
00 00 0010 Float, 32 bits big-endian binary 00 00 0010 Float, 32 bits big-endian binary
00 00 0011 Double, 64 bits big-endian binary 00 00 0011 Double, 64 bits big-endian binary
00 01 xxxx Small integers 0..12,-3..-1
00 10 ttnn Start Stream <tt,nn> 00 10 ttnn Start Stream <tt,nn>
When tt = 00 --> error When tt = 00 --> error
01 --> each chunk is a <tt,nn> piece 01 --> each chunk is a <tt,nn> piece
@ -852,8 +865,6 @@ the same `Value` may yield different binary `Repr`s.
If mmmm = 1111, a varint(m) follows, giving the length, before If mmmm = 1111, a varint(m) follows, giving the length, before
the body; otherwise, m is the length of the body to follow. the body; otherwise, m is the length of the body to follow.
## Appendix. Representing Values in Programming Languages ## Appendix. Representing Values in Programming Languages
We have given a definition of `Value` and its semantics, and proposed We have given a definition of `Value` and its semantics, and proposed
@ -1082,6 +1093,8 @@ what? Some domain-specific base URI?
Q. Are the language mappings reasonable? How about one for Python? Q. Are the language mappings reasonable? How about one for Python?
Q. Literal small integers: could be nice? Not absolutely necessary. Q. Literal small integers: are they pulling their weight? They're not
absolutely necessary. They mess up the connection between
value-ordering and repr-ordering!
## Notes ## Notes