Loosen re overlong varints. Clarify re use of length info

This commit is contained in:
Tony Garnock-Jones 2022-06-11 11:13:48 +02:00
parent dd231284f1
commit d6b0b8bbd8
2 changed files with 29 additions and 6 deletions

View File

@ -23,6 +23,11 @@ binary syntax](preserves-binary.html).
**Annotations.** **Annotations.**
Annotations *MUST NOT* be present. Annotations *MUST NOT* be present.
**Length representations.** [Varint-encoded
lengths](./preserves-binary.html#varint) *MUST* appear in the unique shortest
encoding for a given length. That is, canonical varint-encodings *MUST
NOT* start with `0`.
**Sets.** **Sets.**
The elements of a `Set` *MUST* be serialized sorted in ascending order The elements of a `Set` *MUST* be serialized sorted in ascending order
by comparing their canonical encoded binary representations. by comparing their canonical encoded binary representations.

View File

@ -29,12 +29,13 @@ Each `Repr` starts with a tag byte, describing the kind of information
represented. represented.
However, inspired by [argdata][], a `Repr` does *not* describe its own However, inspired by [argdata][], a `Repr` does *not* describe its own
length. Instead, the surrounding context must supply the length of the length. Instead, the surrounding context must supply the expected length
`Repr`. of the `Repr`.
As a consequence, `Repr`s for `Compound` values store the lengths of As a consequence, `Repr`s for `Compound` values store the lengths of
their contained values. Each contained `Value` is represented as a their contained values. Each contained `Value` is represented as a
length in bytes followed by its own `Repr`. length in bytes followed by its own `Repr`. Implementations use each
stored length to decide when to stop reading the following `Repr`.
<a id="varint"></a> Each length is stored as an [argdata][]-compatible <a id="varint"></a> Each length is stored as an [argdata][]-compatible
big-endian base 128 *varint*.[^see-also-leb128] Each byte of a varint big-endian base 128 *varint*.[^see-also-leb128] Each byte of a varint
@ -63,9 +64,18 @@ The following table illustrates varint-encoding.
| 300 | `0000010 0101100` | 2 172 | | 300 | `0000010 0101100` | 2 172 |
| 1000000000 | `0000011 1011100 1101011 0010100 0000000` | 3 92 107 20 128 | | 1000000000 | `0000011 1011100 1101011 0010100 0000000` | 3 92 107 20 128 |
It is an error for a varint-encoded `m` in a `Repr` to be anything other There is no requirement that a varint-encoded `m` in a `Repr` be the unique shortest encoding
than the unique shortest encoding for that `m`. That is, a for that `m`.[^overlong-varint] However, implementations *SHOULD* use the shortest encoding
varint-encoding of `m` *MUST NOT* start with `0`. whereever possible when writing, and *SHOULD* reject excessively long encodings when reading
encoded values.[^excessively-long-varint]
[^overlong-varint]: **Implementation note.** The spec permits overlong length encodings to
reduce wasted activity in resource-constrained situations. If an implementation is in
anything other than a very low-level language, it is likely to be able to use
[IOList](./conventions.html#iolists)-style data structures to avoid unnecessary copying.
[^excessively-long-varint]: As a guideline, reject more than eight leading `0` bytes in a
varint.
### Records, Sequences, Sets and Dictionaries. ### Records, Sequences, Sets and Dictionaries.
@ -107,6 +117,10 @@ serializing in some other implementation-defined order.
but encoding and then sorting byte strings is much more likely to but encoding and then sorting byte strings is much more likely to
be within easy reach. be within easy reach.
No sentinel marks the end of a sequence of length-prefixed `Repr`s.
During decoding, use the length of the containing `Repr` to decide when
to stop expecting more contained `Repr`s.
### SignedIntegers. ### SignedIntegers.
«x» when x ∈ SignedInteger = [0xA3] ++ intbytes(x) «x» when x ∈ SignedInteger = [0xA3] ++ intbytes(x)
@ -192,6 +206,10 @@ an empty sequence annotated with two symbols, `a` and `b`, is
annotations are skipped, an endless sequence of annotations may give an annotations are skipped, an endless sequence of annotations may give an
illusion of progress. illusion of progress.
**Overlong varints.** The binary format allows (but discourages) overlong [varint](#varint)s.
Consider optional restrictions on the number of redundant leading `0` bytes accepted when
reading a varint.
**Canonical form for cryptographic hashing and signing.** No canonical **Canonical form for cryptographic hashing and signing.** No canonical
textual encoding of a `Value` is specified. A textual encoding of a `Value` is specified. A
[canonical form][canonical] exists for binary encoded `Value`s, and [canonical form][canonical] exists for binary encoded `Value`s, and