Improvements

2023-03-15 17:15:26 +01:00 · 2023-03-15 17:15:26 +01:00 · 07b28ca042
parent d11f008705
commit 07b28ca042
2 changed files with 18 additions and 17 deletions
--- a/_includes/cheatsheet-binary.md
+++ b/_includes/cheatsheet-binary.md
@ -1,4 +1,4 @@
-  [varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
+For a value `V`, we write `«V»` for the binary encoding of `V`.

                          «#f» = [0x80]
                          «#t» = [0x81]
@ -8,7 +8,7 @@
      «V» if V ∈ Float         = [0x87] ++ varint(|binary32(V)|) ++ binary32(V)
      «V» if V ∈ Double        = [0x87] ++ varint(|binary64(V)|) ++ binary64(V)

-      «V» if V ∈ SignedInteger = [0xB0] ++ varint(|intbytes(x)|) ++ intbytes(x)
+      «V» if V ∈ SignedInteger = [0xB0] ++ varint(|intbytes(V)|) ++ intbytes(V)
      «V» if V ∈ String        = [0xB1] ++ varint(|utf8(V)|) ++ utf8(V)
      «V» if V ∈ ByteString    = [0xB2] ++ varint(|V|) ++ V
      «V» if V ∈ Symbol        = [0xB3] ++ varint(|utf8(V)|) ++ utf8(V)
@ -18,19 +18,20 @@
                «#{E_1...E_m}» = [0xB6] ++ «E_1» ++...++ «E_m» ++ [0x84]
         «{K_1:V_1...K_m:V_m}» = [0xB7] ++ «K_1» ++ «V_1» ++...++ «K_m» ++ «V_m» ++ [0x84]

-               «@V_1...@V_n V» = [0xBF] ++ «V» ++ «V_1» ++...++ «V_n» ++ [0x84]
+                     varint(v) = e(v, 128)

-Where
+                       e(v, d) = [v + d] if v < 128
+                                 e(v >> 7, 0) ++ [(v & 0x7F) + d] if v ≥ 128

- - `varint(m)` is the [varint-encoding][varint] of `m`; for example, `varint(15)` is `[0x0F]`,
-   and `varint(1000000000)` is `[0x80, 0x94, 0xeb, 0xdc, 0x03]`.
+The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
+8-byte IEEE 754 binary representations of `F` and `D`, respectively.

- - `intbytes(x)` gives the big-endian two's-complement binary representation of `x`, taking
-   exactly as many whole bytes as needed to unambiguously identify the value and its sign. For
-   example, `intbytes(-128)` is `[0x80]`, `intbytes(-1)` is `[0xFF]`, `intbytes(0)` is `[]`,
-   `intbytes(1)` is `[0x01]`, `intbytes(128)` is `[0x00, 0x80]` etc.
+The function `intbytes(x)` is a big-endian two's-complement signed binary representation of
+`x`, taking exactly as many whole bytes as needed to unambiguously identify the value and its
+sign; in particular, `intbytes(0)` is the empty byte sequence.

- - `utf8(S)` gives the sequence of bytes forming the UTF-8 encoding of the string `S`.
+**Annotations.** To annotate an encoded value `«V»` (that *MUST NOT* itself already be
+annotated) with some sequence of `Value`s `V_1...V_m` (that *MUST* be non-empty),
+surround `«V»` as follows:

- - `binary32(F)` and `binary64(D)` yield big-endian 4- and 8-byte IEEE 754 binary
-   representations of `F` and `D`, respectively.
+               «@V_1...@V_m V» = [0xBF] ++ «V» ++ «V_1» ++...++ «V_m» ++ [0x84]
--- a/preserves-binary.md
+++ b/preserves-binary.md
@ -152,10 +152,10 @@ represent the denoted object, prefixed with `[0x86]`.

    «@V_1...@V_n V» = [0xBF] ++ «V» ++ «V_1» ++...++ «V_n» ++ [0x84]

-`V` *MUST NOT* itself be annotated, but `V_1...V_n` *MAY* be
-annotated. For example, the `Repr` corresponding to textual syntax
-`@a@b[]`, i.e. an empty sequence annotated with two symbols, `a` and
-`b`, is
+`V` *MUST NOT* itself be annotated, but each `V_i` in `V_1...V_n`
+*MAY* be annotated. Furthermore, `n` *MUST* be greater than zero. For
+example, the `Repr` corresponding to textual syntax `@a@b[]`, i.e. an
+empty sequence annotated with two symbols, `a` and `b`, is

    «@a @b []» = [0xBF] ++ «[]» ++ «a» ++ «b» ++ [0x84]
               = [0xBF, 0xB5, 0x84, 0xB3, 0x01, 0x61, 0xB3, 0x01, 0x62, 0x84]