Permit overlong SignedInteger encodings

This commit is contained in:
Tony Garnock-Jones 2022-06-19 17:20:25 +02:00
parent 4c53dadc41
commit 0dd2a8d622
3 changed files with 24 additions and 14 deletions

View File

@ -28,6 +28,10 @@ lengths](./preserves-binary.html#varint) *MUST* appear in the unique shortest
encoding for a given length. That is, canonical varint-encodings *MUST
NOT* start with `0`.
**SignedIntegers.** Each `SignedInteger` *MUST* be serialized using its
shortest possible encoding. That is, the encoding *MUST NOT* have `A3
FF FF` or `A3 00 00` as prefixes, and *MUST NOT* be `A3 00`.
**Sets.**
The elements of a `Set` *MUST* be serialized sorted in ascending order
by comparing their canonical encoded binary representations.
@ -42,7 +46,7 @@ representations of their keys.[^no-need-for-by-value]
**Other kinds of `Value`.**
There are no special canonicalization restrictions on
`SignedInteger`s, `String`s, `ByteString`s, `Symbol`s, `Boolean`s,
`String`s, `ByteString`s, `Symbol`s, `Boolean`s,
`Float`s, `Double`s, `Record`s, `Sequence`s, or `Embedded`s. The
constraints given for these `Value`s in the [specification][spec]
suffice to ensure canonicity.

View File

@ -39,10 +39,10 @@ which could be a file, an HTTP message, a UDP packet, etc.
The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
8-byte IEEE 754 binary representations of `F` and `D`, respectively.
The function `intbytes(x)` gives the big-endian two's-complement signed
binary representation of `x`, taking exactly as many whole bytes as
The function `intbytes(x)` is a big-endian two's-complement signed
binary representation of `x`, taking at least as many whole bytes as
needed to unambiguously identify the value and its sign. `intbytes(0)`
is the empty byte sequence.
may be the empty byte sequence.
When reading, the length of the input is supplied externally. This means
that, when reading a length/value pair in a `seq()`, each length should

View File

@ -117,17 +117,23 @@ to stop expecting more contained `Repr`s.
«x» when x ∈ SignedInteger = [0xA3] ++ intbytes(x)
The function `intbytes(x)` gives the big-endian two's-complement binary
representation of `x`, taking exactly as many whole bytes as needed to
unambiguously identify the value and its sign. As a special case,
`intbytes(0)` is the empty byte sequence. The most-significant bit in
the first byte in `intbytes(x)` (for `x`≠0) is the sign
bit.[^zero-intbytes] Every `SignedInteger` *MUST* be represented with
its shortest possible encoding.
The function `intbytes(x)` gives a big-endian two's-complement binary
representation of `x`, taking at least as many whole bytes as needed to
unambiguously identify the value and its sign; `intbytes(0)` may be the
empty byte sequence.[^zero-intbytes] The most-significant bit in the
first byte in `intbytes(x)` is the sign bit. While every `SignedInteger`
*SHOULD* be represented with its shortest possible encoding (which will
often include a necessary leading `0xFF` or `0x00`), redundant leading
`0xFF` or `0x00` bytes *MAY* be used.[^overlong-signedinteger]
[^zero-intbytes]: The value 0 needs zero bytes to identify the
value, so `intbytes(0)` is the empty byte string. Non-zero values
need at least one byte.
[^zero-intbytes]: The value 0 needs zero bytes to identify the value,
so `intbytes(0)` can be the empty byte string. Non-zero values need
at least one byte.
[^overlong-signedinteger]: **Implementation note.** The spec permits
overlong `SignedInteger` encodings to allow e.g. construction of
`Repr`s by filling in partially-completed templates, which can be
useful in resource-constrained situations.
### Strings, ByteStrings and Symbols.