Loosen re overlong varints. Clarify re use of length info
This commit is contained in:
parent
dd231284f1
commit
d6b0b8bbd8
|
@ -23,6 +23,11 @@ binary syntax](preserves-binary.html).
|
||||||
**Annotations.**
|
**Annotations.**
|
||||||
Annotations *MUST NOT* be present.
|
Annotations *MUST NOT* be present.
|
||||||
|
|
||||||
|
**Length representations.** [Varint-encoded
|
||||||
|
lengths](./preserves-binary.html#varint) *MUST* appear in the unique shortest
|
||||||
|
encoding for a given length. That is, canonical varint-encodings *MUST
|
||||||
|
NOT* start with `0`.
|
||||||
|
|
||||||
**Sets.**
|
**Sets.**
|
||||||
The elements of a `Set` *MUST* be serialized sorted in ascending order
|
The elements of a `Set` *MUST* be serialized sorted in ascending order
|
||||||
by comparing their canonical encoded binary representations.
|
by comparing their canonical encoded binary representations.
|
||||||
|
|
|
@ -29,12 +29,13 @@ Each `Repr` starts with a tag byte, describing the kind of information
|
||||||
represented.
|
represented.
|
||||||
|
|
||||||
However, inspired by [argdata][], a `Repr` does *not* describe its own
|
However, inspired by [argdata][], a `Repr` does *not* describe its own
|
||||||
length. Instead, the surrounding context must supply the length of the
|
length. Instead, the surrounding context must supply the expected length
|
||||||
`Repr`.
|
of the `Repr`.
|
||||||
|
|
||||||
As a consequence, `Repr`s for `Compound` values store the lengths of
|
As a consequence, `Repr`s for `Compound` values store the lengths of
|
||||||
their contained values. Each contained `Value` is represented as a
|
their contained values. Each contained `Value` is represented as a
|
||||||
length in bytes followed by its own `Repr`.
|
length in bytes followed by its own `Repr`. Implementations use each
|
||||||
|
stored length to decide when to stop reading the following `Repr`.
|
||||||
|
|
||||||
<a id="varint"></a> Each length is stored as an [argdata][]-compatible
|
<a id="varint"></a> Each length is stored as an [argdata][]-compatible
|
||||||
big-endian base 128 *varint*.[^see-also-leb128] Each byte of a varint
|
big-endian base 128 *varint*.[^see-also-leb128] Each byte of a varint
|
||||||
|
@ -63,9 +64,18 @@ The following table illustrates varint-encoding.
|
||||||
| 300 | `0000010 0101100` | 2 172 |
|
| 300 | `0000010 0101100` | 2 172 |
|
||||||
| 1000000000 | `0000011 1011100 1101011 0010100 0000000` | 3 92 107 20 128 |
|
| 1000000000 | `0000011 1011100 1101011 0010100 0000000` | 3 92 107 20 128 |
|
||||||
|
|
||||||
It is an error for a varint-encoded `m` in a `Repr` to be anything other
|
There is no requirement that a varint-encoded `m` in a `Repr` be the unique shortest encoding
|
||||||
than the unique shortest encoding for that `m`. That is, a
|
for that `m`.[^overlong-varint] However, implementations *SHOULD* use the shortest encoding
|
||||||
varint-encoding of `m` *MUST NOT* start with `0`.
|
whereever possible when writing, and *SHOULD* reject excessively long encodings when reading
|
||||||
|
encoded values.[^excessively-long-varint]
|
||||||
|
|
||||||
|
[^overlong-varint]: **Implementation note.** The spec permits overlong length encodings to
|
||||||
|
reduce wasted activity in resource-constrained situations. If an implementation is in
|
||||||
|
anything other than a very low-level language, it is likely to be able to use
|
||||||
|
[IOList](./conventions.html#iolists)-style data structures to avoid unnecessary copying.
|
||||||
|
|
||||||
|
[^excessively-long-varint]: As a guideline, reject more than eight leading `0` bytes in a
|
||||||
|
varint.
|
||||||
|
|
||||||
### Records, Sequences, Sets and Dictionaries.
|
### Records, Sequences, Sets and Dictionaries.
|
||||||
|
|
||||||
|
@ -107,6 +117,10 @@ serializing in some other implementation-defined order.
|
||||||
but encoding and then sorting byte strings is much more likely to
|
but encoding and then sorting byte strings is much more likely to
|
||||||
be within easy reach.
|
be within easy reach.
|
||||||
|
|
||||||
|
No sentinel marks the end of a sequence of length-prefixed `Repr`s.
|
||||||
|
During decoding, use the length of the containing `Repr` to decide when
|
||||||
|
to stop expecting more contained `Repr`s.
|
||||||
|
|
||||||
### SignedIntegers.
|
### SignedIntegers.
|
||||||
|
|
||||||
«x» when x ∈ SignedInteger = [0xA3] ++ intbytes(x)
|
«x» when x ∈ SignedInteger = [0xA3] ++ intbytes(x)
|
||||||
|
@ -192,6 +206,10 @@ an empty sequence annotated with two symbols, `a` and `b`, is
|
||||||
annotations are skipped, an endless sequence of annotations may give an
|
annotations are skipped, an endless sequence of annotations may give an
|
||||||
illusion of progress.
|
illusion of progress.
|
||||||
|
|
||||||
|
**Overlong varints.** The binary format allows (but discourages) overlong [varint](#varint)s.
|
||||||
|
Consider optional restrictions on the number of redundant leading `0` bytes accepted when
|
||||||
|
reading a varint.
|
||||||
|
|
||||||
**Canonical form for cryptographic hashing and signing.** No canonical
|
**Canonical form for cryptographic hashing and signing.** No canonical
|
||||||
textual encoding of a `Value` is specified. A
|
textual encoding of a `Value` is specified. A
|
||||||
[canonical form][canonical] exists for binary encoded `Value`s, and
|
[canonical form][canonical] exists for binary encoded `Value`s, and
|
||||||
|
|
Loading…
Reference in New Issue