Loosen re overlong varints. Clarify re use of length info
This commit is contained in:
parent
dd231284f1
commit
d6b0b8bbd8
|
@ -23,6 +23,11 @@ binary syntax](preserves-binary.html).
|
|||
**Annotations.**
|
||||
Annotations *MUST NOT* be present.
|
||||
|
||||
**Length representations.** [Varint-encoded
|
||||
lengths](./preserves-binary.html#varint) *MUST* appear in the unique shortest
|
||||
encoding for a given length. That is, canonical varint-encodings *MUST
|
||||
NOT* start with `0`.
|
||||
|
||||
**Sets.**
|
||||
The elements of a `Set` *MUST* be serialized sorted in ascending order
|
||||
by comparing their canonical encoded binary representations.
|
||||
|
|
|
@ -29,12 +29,13 @@ Each `Repr` starts with a tag byte, describing the kind of information
|
|||
represented.
|
||||
|
||||
However, inspired by [argdata][], a `Repr` does *not* describe its own
|
||||
length. Instead, the surrounding context must supply the length of the
|
||||
`Repr`.
|
||||
length. Instead, the surrounding context must supply the expected length
|
||||
of the `Repr`.
|
||||
|
||||
As a consequence, `Repr`s for `Compound` values store the lengths of
|
||||
their contained values. Each contained `Value` is represented as a
|
||||
length in bytes followed by its own `Repr`.
|
||||
length in bytes followed by its own `Repr`. Implementations use each
|
||||
stored length to decide when to stop reading the following `Repr`.
|
||||
|
||||
<a id="varint"></a> Each length is stored as an [argdata][]-compatible
|
||||
big-endian base 128 *varint*.[^see-also-leb128] Each byte of a varint
|
||||
|
@ -63,9 +64,18 @@ The following table illustrates varint-encoding.
|
|||
| 300 | `0000010 0101100` | 2 172 |
|
||||
| 1000000000 | `0000011 1011100 1101011 0010100 0000000` | 3 92 107 20 128 |
|
||||
|
||||
It is an error for a varint-encoded `m` in a `Repr` to be anything other
|
||||
than the unique shortest encoding for that `m`. That is, a
|
||||
varint-encoding of `m` *MUST NOT* start with `0`.
|
||||
There is no requirement that a varint-encoded `m` in a `Repr` be the unique shortest encoding
|
||||
for that `m`.[^overlong-varint] However, implementations *SHOULD* use the shortest encoding
|
||||
whereever possible when writing, and *SHOULD* reject excessively long encodings when reading
|
||||
encoded values.[^excessively-long-varint]
|
||||
|
||||
[^overlong-varint]: **Implementation note.** The spec permits overlong length encodings to
|
||||
reduce wasted activity in resource-constrained situations. If an implementation is in
|
||||
anything other than a very low-level language, it is likely to be able to use
|
||||
[IOList](./conventions.html#iolists)-style data structures to avoid unnecessary copying.
|
||||
|
||||
[^excessively-long-varint]: As a guideline, reject more than eight leading `0` bytes in a
|
||||
varint.
|
||||
|
||||
### Records, Sequences, Sets and Dictionaries.
|
||||
|
||||
|
@ -107,6 +117,10 @@ serializing in some other implementation-defined order.
|
|||
but encoding and then sorting byte strings is much more likely to
|
||||
be within easy reach.
|
||||
|
||||
No sentinel marks the end of a sequence of length-prefixed `Repr`s.
|
||||
During decoding, use the length of the containing `Repr` to decide when
|
||||
to stop expecting more contained `Repr`s.
|
||||
|
||||
### SignedIntegers.
|
||||
|
||||
«x» when x ∈ SignedInteger = [0xA3] ++ intbytes(x)
|
||||
|
@ -192,6 +206,10 @@ an empty sequence annotated with two symbols, `a` and `b`, is
|
|||
annotations are skipped, an endless sequence of annotations may give an
|
||||
illusion of progress.
|
||||
|
||||
**Overlong varints.** The binary format allows (but discourages) overlong [varint](#varint)s.
|
||||
Consider optional restrictions on the number of redundant leading `0` bytes accepted when
|
||||
reading a varint.
|
||||
|
||||
**Canonical form for cryptographic hashing and signing.** No canonical
|
||||
textual encoding of a `Value` is specified. A
|
||||
[canonical form][canonical] exists for binary encoded `Value`s, and
|
||||
|
|
Loading…
Reference in New Issue