From 0dd2a8d62269b6d705c93059a3d8f07c68a46dfb Mon Sep 17 00:00:00 2001 From: Tony Garnock-Jones Date: Sun, 19 Jun 2022 17:20:25 +0200 Subject: [PATCH] Permit overlong SignedInteger encodings --- canonical-binary.md | 6 +++++- cheatsheet.md | 6 +++--- preserves-binary.md | 26 ++++++++++++++++---------- 3 files changed, 24 insertions(+), 14 deletions(-) diff --git a/canonical-binary.md b/canonical-binary.md index 3e5ebef..db02bec 100644 --- a/canonical-binary.md +++ b/canonical-binary.md @@ -28,6 +28,10 @@ lengths](./preserves-binary.html#varint) *MUST* appear in the unique shortest encoding for a given length. That is, canonical varint-encodings *MUST NOT* start with `0`. +**SignedIntegers.** Each `SignedInteger` *MUST* be serialized using its +shortest possible encoding. That is, the encoding *MUST NOT* have `A3 +FF FF` or `A3 00 00` as prefixes, and *MUST NOT* be `A3 00`. + **Sets.** The elements of a `Set` *MUST* be serialized sorted in ascending order by comparing their canonical encoded binary representations. @@ -42,7 +46,7 @@ representations of their keys.[^no-need-for-by-value] **Other kinds of `Value`.** There are no special canonicalization restrictions on -`SignedInteger`s, `String`s, `ByteString`s, `Symbol`s, `Boolean`s, +`String`s, `ByteString`s, `Symbol`s, `Boolean`s, `Float`s, `Double`s, `Record`s, `Sequence`s, or `Embedded`s. The constraints given for these `Value`s in the [specification][spec] suffice to ensure canonicity. diff --git a/cheatsheet.md b/cheatsheet.md index 8258997..d07a670 100644 --- a/cheatsheet.md +++ b/cheatsheet.md @@ -39,10 +39,10 @@ which could be a file, an HTTP message, a UDP packet, etc. The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and 8-byte IEEE 754 binary representations of `F` and `D`, respectively. -The function `intbytes(x)` gives the big-endian two's-complement signed -binary representation of `x`, taking exactly as many whole bytes as +The function `intbytes(x)` is a big-endian two's-complement signed +binary representation of `x`, taking at least as many whole bytes as needed to unambiguously identify the value and its sign. `intbytes(0)` -is the empty byte sequence. +may be the empty byte sequence. When reading, the length of the input is supplied externally. This means that, when reading a length/value pair in a `seq()`, each length should diff --git a/preserves-binary.md b/preserves-binary.md index 661ca89..80d9fe1 100644 --- a/preserves-binary.md +++ b/preserves-binary.md @@ -117,17 +117,23 @@ to stop expecting more contained `Repr`s. «x» when x ∈ SignedInteger = [0xA3] ++ intbytes(x) -The function `intbytes(x)` gives the big-endian two's-complement binary -representation of `x`, taking exactly as many whole bytes as needed to -unambiguously identify the value and its sign. As a special case, -`intbytes(0)` is the empty byte sequence. The most-significant bit in -the first byte in `intbytes(x)` (for `x`≠0) is the sign -bit.[^zero-intbytes] Every `SignedInteger` *MUST* be represented with -its shortest possible encoding. +The function `intbytes(x)` gives a big-endian two's-complement binary +representation of `x`, taking at least as many whole bytes as needed to +unambiguously identify the value and its sign; `intbytes(0)` may be the +empty byte sequence.[^zero-intbytes] The most-significant bit in the +first byte in `intbytes(x)` is the sign bit. While every `SignedInteger` +*SHOULD* be represented with its shortest possible encoding (which will +often include a necessary leading `0xFF` or `0x00`), redundant leading +`0xFF` or `0x00` bytes *MAY* be used.[^overlong-signedinteger] - [^zero-intbytes]: The value 0 needs zero bytes to identify the - value, so `intbytes(0)` is the empty byte string. Non-zero values - need at least one byte. + [^zero-intbytes]: The value 0 needs zero bytes to identify the value, + so `intbytes(0)` can be the empty byte string. Non-zero values need + at least one byte. + + [^overlong-signedinteger]: **Implementation note.** The spec permits + overlong `SignedInteger` encodings to allow e.g. construction of + `Repr`s by filling in partially-completed templates, which can be + useful in resource-constrained situations. ### Strings, ByteStrings and Symbols.