preserves/canonical-binary.md

2.8 KiB

title
Canonical Form for Binary Syntax

When two Values are written down in canonical form, comparing their syntax for equivalence gives the same result as comparing them semantically according to the equivalence defined in the Preserves specification.1

That is, canonical forms are equal if and only if the encoded Values are equal.

This document specifies canonical form for the Preserves compact binary syntax.

General rules. Streaming formats ("format C") MUST NOT be used. Annotations MUST NOT be present. Placeholders MUST NOT be used. Where possible, fixed-length ("format A") MUST be used in preference to variable-length ("format B") formats.

Signed integers. When a SignedInteger n is greater than or equal to -3 and less than 13 (i.e. -3≤n<13), it MUST be represented using the single-byte encoding with initial nibble equal to 3. Otherwise (i.e. when n<-3 or n≥13), it MUST be represented using the multi-byte encoding with initial nibble equal to 4, and the variable-length part must be as short as possible while remaining unambiguous.2

Sets. The elements of a Set MUST be serialized sorted in ascending order following the total order relation defined in the Preserves specification.

Dictionaries. The key-value pairs in a Dictionary MUST be serialized sorted in ascending order by key, following the total order relation defined in the Preserves specification.

Other kinds of Value. There are no special canonicalization restrictions on Strings, ByteStrings, Symbols, Booleans, Floats, Doubles, Records, or Sequences.

Notes


  1. However, canonical form does not induce a match between lexicographic ordering on syntax and semantic ordering as specified. It only induces a connection between equivalences. ↩︎

  2. The following examples from the specification are all in canonical form:

    [[   -257 ]] = 42 FE FF    [[     -3 ]] = 3D       [[    128 ]] = 42 00 80
    [[   -256 ]] = 42 FF 00    [[     -2 ]] = 3E       [[    255 ]] = 42 00 FF
    [[   -255 ]] = 42 FF 01    [[     -1 ]] = 3F       [[    256 ]] = 42 01 00
    [[   -254 ]] = 42 FF 02    [[      0 ]] = 30       [[  32767 ]] = 42 7F FF
    [[   -129 ]] = 42 FF 7F    [[      1 ]] = 31       [[  32768 ]] = 43 00 80 00
    [[   -128 ]] = 41 80       [[     12 ]] = 3C       [[  65535 ]] = 43 00 FF FF
    [[   -127 ]] = 41 81       [[     13 ]] = 41 0D    [[  65536 ]] = 43 01 00 00
    [[     -4 ]] = 41 FC       [[    127 ]] = 41 7F    [[ 131072 ]] = 43 02 00 00
    
    ↩︎