diff --git a/README.md b/README.md index 1c92e60..65c9a52 100644 --- a/README.md +++ b/README.md @@ -21,6 +21,7 @@ comparable to JSON, XML, S-expressions, CBOR, ASN.1 BER, and so on. - [Preserves tutorial](TUTORIAL.html) - [Preserves specification](preserves.html), including semantics, textual syntax, and compact binary syntax + - [Canonical Form for Binary Syntax](canonical-binary.html) ## Additional resources diff --git a/canonical-binary.md b/canonical-binary.md new file mode 100644 index 0000000..b592e3e --- /dev/null +++ b/canonical-binary.md @@ -0,0 +1,68 @@ +--- +title: "Canonical Form for Binary Syntax" +--- + + [spec]: preserves.html + +When two `Value`s are written down in *canonical form*, comparing +their *syntax* for equivalence gives the same result as comparing them +*semantically* according to the equivalence defined in the +[Preserves specification][spec].[^equivalence-not-ordering] + + [^equivalence-not-ordering]: However, canonical form does *not* + induce a match between lexicographic ordering on syntax and + semantic ordering [as specified][spec]. It *only* induces a + connection between equivalences. + +That is, canonical forms are equal if and only if the encoded `Value`s +are equal. + +This document specifies canonical form for the Preserves compact +binary syntax. + +**General rules.** +Streaming formats ("format C") MUST NOT be used. +Annotations MUST NOT be present. +Placeholders MUST NOT be used. +Where possible, fixed-length ("format A") MUST be used in preference +to variable-length ("format B") formats. + +**Signed integers.** +When a `SignedInteger` *n* is greater than or equal to -3 and less +than 13 (i.e. -3≤*n*<13), it MUST be represented using the single-byte +encoding with initial nibble equal to 3. +Otherwise (i.e. when *n*<-3 or *n*≥13), it MUST be represented using +the multi-byte encoding with initial nibble equal to 4, and the +variable-length part must be as short as possible while remaining +unambiguous.[^signed-integer-examples] + + [^signed-integer-examples]: The following examples from + [the specification][spec] are all in canonical form: + + [[ -257 ]] = 42 FE FF [[ -3 ]] = 3D [[ 128 ]] = 42 00 80 + [[ -256 ]] = 42 FF 00 [[ -2 ]] = 3E [[ 255 ]] = 42 00 FF + [[ -255 ]] = 42 FF 01 [[ -1 ]] = 3F [[ 256 ]] = 42 01 00 + [[ -254 ]] = 42 FF 02 [[ 0 ]] = 30 [[ 32767 ]] = 42 7F FF + [[ -129 ]] = 42 FF 7F [[ 1 ]] = 31 [[ 32768 ]] = 43 00 80 00 + [[ -128 ]] = 41 80 [[ 12 ]] = 3C [[ 65535 ]] = 43 00 FF FF + [[ -127 ]] = 41 81 [[ 13 ]] = 41 0D [[ 65536 ]] = 43 01 00 00 + [[ -4 ]] = 41 FC [[ 127 ]] = 41 7F [[ 131072 ]] = 43 02 00 00 + + +**Sets.** +The elements of a `Set` MUST be serialized sorted in ascending order +following the total order relation defined in the +[Preserves specification][spec]. + +**Dictionaries.** +The key-value pairs in a `Dictionary` MUST be serialized sorted in +ascending order by key, following the total order relation defined in +the [Preserves specification][spec]. + +**Other kinds of `Value`.** +There are no special canonicalization restrictions on `String`s, +`ByteString`s, `Symbol`s, `Boolean`s, `Float`s, `Double`s, `Record`s, +or `Sequence`s. + + +## Notes