Permit overlong SignedInteger encodings

2022-06-19 17:20:25 +02:00 · 2022-06-19 17:20:25 +02:00 · 0dd2a8d622
parent 4c53dadc41
commit 0dd2a8d622
3 changed files with 24 additions and 14 deletions
--- a/canonical-binary.md
+++ b/canonical-binary.md
@ -28,6 +28,10 @@ lengths](./preserves-binary.html#varint) *MUST* appear in the unique shortest
 encoding for a given length. That is, canonical varint-encodings *MUST
 NOT* start with `0`.

+**SignedIntegers.** Each `SignedInteger` *MUST* be serialized using its
+shortest possible encoding. That is, the encoding *MUST NOT* have `A3
+FF FF` or `A3 00 00` as prefixes, and *MUST NOT* be `A3 00`.
+
 **Sets.**
 The elements of a `Set` *MUST* be serialized sorted in ascending order
 by comparing their canonical encoded binary representations.
@ -42,7 +46,7 @@ representations of their keys.[^no-need-for-by-value]

 **Other kinds of `Value`.**
 There are no special canonicalization restrictions on
-`SignedInteger`s, `String`s, `ByteString`s, `Symbol`s, `Boolean`s,
+`String`s, `ByteString`s, `Symbol`s, `Boolean`s,
 `Float`s, `Double`s, `Record`s, `Sequence`s, or `Embedded`s. The
 constraints given for these `Value`s in the [specification][spec]
 suffice to ensure canonicity.
--- a/cheatsheet.md
+++ b/cheatsheet.md
@ -39,10 +39,10 @@ which could be a file, an HTTP message, a UDP packet, etc.
 The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
 8-byte IEEE 754 binary representations of `F` and `D`, respectively.

-The function `intbytes(x)` gives the big-endian two's-complement signed
-binary representation of `x`, taking exactly as many whole bytes as
+The function `intbytes(x)` is a big-endian two's-complement signed
+binary representation of `x`, taking at least as many whole bytes as
 needed to unambiguously identify the value and its sign. `intbytes(0)`
-is the empty byte sequence.
+may be the empty byte sequence.

 When reading, the length of the input is supplied externally. This means
 that, when reading a length/value pair in a `seq()`, each length should
--- a/preserves-binary.md
+++ b/preserves-binary.md
@ -117,17 +117,23 @@ to stop expecting more contained `Repr`s.

    «x» when x ∈ SignedInteger = [0xA3] ++ intbytes(x)

-The function `intbytes(x)` gives the big-endian two's-complement binary
-representation of `x`, taking exactly as many whole bytes as needed to
-unambiguously identify the value and its sign. As a special case,
-`intbytes(0)` is the empty byte sequence. The most-significant bit in
-the first byte in `intbytes(x)` (for `x`≠0) is the sign
-bit.[^zero-intbytes] Every `SignedInteger` *MUST* be represented with
-its shortest possible encoding.
+The function `intbytes(x)` gives a big-endian two's-complement binary
+representation of `x`, taking at least as many whole bytes as needed to
+unambiguously identify the value and its sign; `intbytes(0)` may be the
+empty byte sequence.[^zero-intbytes] The most-significant bit in the
+first byte in `intbytes(x)` is the sign bit. While every `SignedInteger`
+*SHOULD* be represented with its shortest possible encoding (which will
+often include a necessary leading `0xFF` or `0x00`), redundant leading
+`0xFF` or `0x00` bytes *MAY* be used.[^overlong-signedinteger]

-  [^zero-intbytes]: The value 0 needs zero bytes to identify the
-    value, so `intbytes(0)` is the empty byte string. Non-zero values
-    need at least one byte.
+  [^zero-intbytes]: The value 0 needs zero bytes to identify the value,
+    so `intbytes(0)` can be the empty byte string. Non-zero values need
+    at least one byte.
+
+  [^overlong-signedinteger]: **Implementation note.** The spec permits
+    overlong `SignedInteger` encodings to allow e.g. construction of
+    `Repr`s by filling in partially-completed templates, which can be
+    useful in resource-constrained situations.

 ### Strings, ByteStrings and Symbols.