Improvements

This commit is contained in:
Tony Garnock-Jones 2023-03-15 17:15:26 +01:00
parent d11f008705
commit 07b28ca042
2 changed files with 18 additions and 17 deletions

View File

@ -1,4 +1,4 @@
[varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
For a value `V`, we write `«V»` for the binary encoding of `V`.
«#f» = [0x80]
«#t» = [0x81]
@ -8,7 +8,7 @@
«V» if V ∈ Float = [0x87] ++ varint(|binary32(V)|) ++ binary32(V)
«V» if V ∈ Double = [0x87] ++ varint(|binary64(V)|) ++ binary64(V)
«V» if V ∈ SignedInteger = [0xB0] ++ varint(|intbytes(x)|) ++ intbytes(x)
«V» if V ∈ SignedInteger = [0xB0] ++ varint(|intbytes(V)|) ++ intbytes(V)
«V» if V ∈ String = [0xB1] ++ varint(|utf8(V)|) ++ utf8(V)
«V» if V ∈ ByteString = [0xB2] ++ varint(|V|) ++ V
«V» if V ∈ Symbol = [0xB3] ++ varint(|utf8(V)|) ++ utf8(V)
@ -18,19 +18,20 @@
«#{E_1...E_m}» = [0xB6] ++ «E_1» ++...++ «E_m» ++ [0x84]
«{K_1:V_1...K_m:V_m}» = [0xB7] ++ «K_1» ++ «V_1» ++...++ «K_m» ++ «V_m» ++ [0x84]
«@V_1...@V_n V» = [0xBF] ++ «V» ++ «V_1» ++...++ «V_n» ++ [0x84]
varint(v) = e(v, 128)
Where
e(v, d) = [v + d] if v < 128
e(v >> 7, 0) ++ [(v & 0x7F) + d] if v ≥ 128
- `varint(m)` is the [varint-encoding][varint] of `m`; for example, `varint(15)` is `[0x0F]`,
and `varint(1000000000)` is `[0x80, 0x94, 0xeb, 0xdc, 0x03]`.
The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
8-byte IEEE 754 binary representations of `F` and `D`, respectively.
- `intbytes(x)` gives the big-endian two's-complement binary representation of `x`, taking
exactly as many whole bytes as needed to unambiguously identify the value and its sign. For
example, `intbytes(-128)` is `[0x80]`, `intbytes(-1)` is `[0xFF]`, `intbytes(0)` is `[]`,
`intbytes(1)` is `[0x01]`, `intbytes(128)` is `[0x00, 0x80]` etc.
The function `intbytes(x)` is a big-endian two's-complement signed binary representation of
`x`, taking exactly as many whole bytes as needed to unambiguously identify the value and its
sign; in particular, `intbytes(0)` is the empty byte sequence.
- `utf8(S)` gives the sequence of bytes forming the UTF-8 encoding of the string `S`.
**Annotations.** To annotate an encoded value `«V»` (that *MUST NOT* itself already be
annotated) with some sequence of `Value`s `V_1...V_m` (that *MUST* be non-empty),
surround `«V»` as follows:
- `binary32(F)` and `binary64(D)` yield big-endian 4- and 8-byte IEEE 754 binary
representations of `F` and `D`, respectively.
«@V_1...@V_m V» = [0xBF] ++ «V» ++ «V_1» ++...++ «V_m» ++ [0x84]

View File

@ -152,10 +152,10 @@ represent the denoted object, prefixed with `[0x86]`.
«@V_1...@V_n V» = [0xBF] ++ «V» ++ «V_1» ++...++ «V_n» ++ [0x84]
`V` *MUST NOT* itself be annotated, but `V_1...V_n` *MAY* be
annotated. For example, the `Repr` corresponding to textual syntax
`@a@b[]`, i.e. an empty sequence annotated with two symbols, `a` and
`b`, is
`V` *MUST NOT* itself be annotated, but each `V_i` in `V_1...V_n`
*MAY* be annotated. Furthermore, `n` *MUST* be greater than zero. For
example, the `Repr` corresponding to textual syntax `@a@b[]`, i.e. an
empty sequence annotated with two symbols, `a` and `b`, is
«@a @b []» = [0xBF] ++ «[]» ++ «a» ++ «b» ++ [0x84]
= [0xBF, 0xB5, 0x84, 0xB3, 0x01, 0x61, 0xB3, 0x01, 0x62, 0x84]