Canonical Form for Binary Syntax

This commit is contained in:
Tony Garnock-Jones 2019-10-08 13:27:35 +01:00
parent da2a6d44d1
commit fe8c766d1e
2 changed files with 69 additions and 0 deletions

View File

@ -21,6 +21,7 @@ comparable to JSON, XML, S-expressions, CBOR, ASN.1 BER, and so on.
- [Preserves tutorial](TUTORIAL.html)
- [Preserves specification](preserves.html), including semantics,
textual syntax, and compact binary syntax
- [Canonical Form for Binary Syntax](canonical-binary.html)
## Additional resources

68
canonical-binary.md Normal file
View File

@ -0,0 +1,68 @@
---
title: "Canonical Form for Binary Syntax"
---
[spec]: preserves.html
When two `Value`s are written down in *canonical form*, comparing
their *syntax* for equivalence gives the same result as comparing them
*semantically* according to the equivalence defined in the
[Preserves specification][spec].[^equivalence-not-ordering]
[^equivalence-not-ordering]: However, canonical form does *not*
induce a match between lexicographic ordering on syntax and
semantic ordering [as specified][spec]. It *only* induces a
connection between equivalences.
That is, canonical forms are equal if and only if the encoded `Value`s
are equal.
This document specifies canonical form for the Preserves compact
binary syntax.
**General rules.**
Streaming formats ("format C") MUST NOT be used.
Annotations MUST NOT be present.
Placeholders MUST NOT be used.
Where possible, fixed-length ("format A") MUST be used in preference
to variable-length ("format B") formats.
**Signed integers.**
When a `SignedInteger` *n* is greater than or equal to -3 and less
than 13 (i.e. -3≤*n*<13), it MUST be represented using the single-byte
encoding with initial nibble equal to 3.
Otherwise (i.e. when *n*<-3 or *n*13), it MUST be represented using
the multi-byte encoding with initial nibble equal to 4, and the
variable-length part must be as short as possible while remaining
unambiguous.[^signed-integer-examples]
[^signed-integer-examples]: The following examples from
[the specification][spec] are all in canonical form:
[[ -257 ]] = 42 FE FF [[ -3 ]] = 3D [[ 128 ]] = 42 00 80
[[ -256 ]] = 42 FF 00 [[ -2 ]] = 3E [[ 255 ]] = 42 00 FF
[[ -255 ]] = 42 FF 01 [[ -1 ]] = 3F [[ 256 ]] = 42 01 00
[[ -254 ]] = 42 FF 02 [[ 0 ]] = 30 [[ 32767 ]] = 42 7F FF
[[ -129 ]] = 42 FF 7F [[ 1 ]] = 31 [[ 32768 ]] = 43 00 80 00
[[ -128 ]] = 41 80 [[ 12 ]] = 3C [[ 65535 ]] = 43 00 FF FF
[[ -127 ]] = 41 81 [[ 13 ]] = 41 0D [[ 65536 ]] = 43 01 00 00
[[ -4 ]] = 41 FC [[ 127 ]] = 41 7F [[ 131072 ]] = 43 02 00 00
**Sets.**
The elements of a `Set` MUST be serialized sorted in ascending order
following the total order relation defined in the
[Preserves specification][spec].
**Dictionaries.**
The key-value pairs in a `Dictionary` MUST be serialized sorted in
ascending order by key, following the total order relation defined in
the [Preserves specification][spec].
**Other kinds of `Value`.**
There are no special canonicalization restrictions on `String`s,
`ByteString`s, `Symbol`s, `Boolean`s, `Float`s, `Double`s, `Record`s,
or `Sequence`s.
<!-- Heading to visually offset the footnotes from the main document: -->
## Notes