Put cheatsheet in an appendix
This commit is contained in:
parent
9ee59562a1
commit
df1d75d181
|
@ -0,0 +1,46 @@
|
||||||
|
For a value `v`, we write `«v»` for the binary encoding of `v`. The
|
||||||
|
length of an encoding is always available from context: either from
|
||||||
|
a containing encoded value, or from the overall container of the data,
|
||||||
|
which could be a file, an HTTP message, a UDP packet, etc.
|
||||||
|
|
||||||
|
«#f» = [0xA0]
|
||||||
|
«#t» = [0xA1]
|
||||||
|
«F» = [0xA2] ++ binary32(F) if F ∈ Float
|
||||||
|
«D» = [0xA2] ++ binary64(D) if D ∈ Double
|
||||||
|
«x» = [0xA3] ++ intbytes(x) if x ∈ SignedInteger
|
||||||
|
«S» = [0xA4] ++ utf8(S) ++ [0] if S ∈ String
|
||||||
|
[0xA5] ++ S if S ∈ ByteString
|
||||||
|
[0xA6] ++ utf8(S) if S ∈ Symbol
|
||||||
|
|
||||||
|
«<L F_1...F_m>» = [0xA7] ++ seq(«L», «F_1», ..., «F_m»)
|
||||||
|
«[X_1...X_m]» = [0xA8] ++ seq(«X_1», ..., «X_m»)
|
||||||
|
«#{E_1...E_m}» = [0xA9] ++ seq(«E_1», ..., «E_m»)
|
||||||
|
«{K_1:V_1...K_m:V_m}» = [0xAA] ++ seq(«K_1», «V_1», ..., «K_m», «V_m»)
|
||||||
|
|
||||||
|
seq(R_1, ..., R_m) = len(|R_1|) ++ R_1 ++...++ len(|R_m|) ++ R_m
|
||||||
|
|
||||||
|
len(m) = e(m, 128)
|
||||||
|
|
||||||
|
e(v, d) = [v + d] if v < 128
|
||||||
|
e(v / 128, 0) ++ [(v % 128) + d] if v ≥ 128
|
||||||
|
|
||||||
|
«#!V» = [0xAB] ++ «V»
|
||||||
|
|
||||||
|
The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
|
||||||
|
8-byte IEEE 754 binary representations of `F` and `D`, respectively.
|
||||||
|
|
||||||
|
The function `intbytes(x)` is a big-endian two's-complement signed
|
||||||
|
binary representation of `x`, taking at least as many whole bytes as
|
||||||
|
needed to unambiguously identify the value and its sign. `intbytes(0)`
|
||||||
|
may be the empty byte sequence.
|
||||||
|
|
||||||
|
When reading, the length of the input is supplied externally. This means
|
||||||
|
that, when reading a length/value pair in a `seq()`, each length should
|
||||||
|
be passed down to the decoder for the corresponding value, so that the
|
||||||
|
decoder knows when to stop.
|
||||||
|
|
||||||
|
**Annotations.** To annotate a `Repr` `r` (that *MUST NOT* itself
|
||||||
|
already be annotated) with some sequence of `Value`s `[v_1, ..., v_m]`
|
||||||
|
(that *MUST* be non-empty), surround `r` as follows:
|
||||||
|
|
||||||
|
[0xBF] ++ len(|r|) ++ r ++ len(|«v_1»|) ++ «v_1» ++...++ len(|«v_m»|) ++ «v_m»
|
|
@ -8,49 +8,4 @@ June 2022. Version 0.7.0.
|
||||||
|
|
||||||
## Machine-Oriented Binary Syntax
|
## Machine-Oriented Binary Syntax
|
||||||
|
|
||||||
For a value `v`, we write `«v»` for the binary encoding of `v`. The
|
{% include cheatsheet-binary.md %}
|
||||||
length of an encoding is always available from context: either from
|
|
||||||
a containing encoded value, or from the overall container of the data,
|
|
||||||
which could be a file, an HTTP message, a UDP packet, etc.
|
|
||||||
|
|
||||||
«#f» = [0xA0]
|
|
||||||
«#t» = [0xA1]
|
|
||||||
«F» = [0xA2] ++ binary32(F) if F ∈ Float
|
|
||||||
«D» = [0xA2] ++ binary64(D) if D ∈ Double
|
|
||||||
«x» = [0xA3] ++ intbytes(x) if x ∈ SignedInteger
|
|
||||||
«S» = [0xA4] ++ utf8(S) ++ [0] if S ∈ String
|
|
||||||
[0xA5] ++ S if S ∈ ByteString
|
|
||||||
[0xA6] ++ utf8(S) if S ∈ Symbol
|
|
||||||
|
|
||||||
«<L F_1...F_m>» = [0xA7] ++ seq(«L», «F_1», ..., «F_m»)
|
|
||||||
«[X_1...X_m]» = [0xA8] ++ seq(«X_1», ..., «X_m»)
|
|
||||||
«#{E_1...E_m}» = [0xA9] ++ seq(«E_1», ..., «E_m»)
|
|
||||||
«{K_1:V_1...K_m:V_m}» = [0xAA] ++ seq(«K_1», «V_1», ..., «K_m», «V_m»)
|
|
||||||
|
|
||||||
seq(R_1, ..., R_m) = len(|R_1|) ++ R_1 ++...++ len(|R_m|) ++ R_m
|
|
||||||
|
|
||||||
len(m) = e(m, 128)
|
|
||||||
|
|
||||||
e(v, d) = [v + d] if v < 128
|
|
||||||
e(v / 128, 0) ++ [(v % 128) + d] if v ≥ 128
|
|
||||||
|
|
||||||
«#!V» = [0xAB] ++ «V»
|
|
||||||
|
|
||||||
The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
|
|
||||||
8-byte IEEE 754 binary representations of `F` and `D`, respectively.
|
|
||||||
|
|
||||||
The function `intbytes(x)` is a big-endian two's-complement signed
|
|
||||||
binary representation of `x`, taking at least as many whole bytes as
|
|
||||||
needed to unambiguously identify the value and its sign. `intbytes(0)`
|
|
||||||
may be the empty byte sequence.
|
|
||||||
|
|
||||||
When reading, the length of the input is supplied externally. This means
|
|
||||||
that, when reading a length/value pair in a `seq()`, each length should
|
|
||||||
be passed down to the decoder for the corresponding value, so that the
|
|
||||||
decoder knows when to stop.
|
|
||||||
|
|
||||||
**Annotations.** To annotate a `Repr` `r` (that *MUST NOT* itself
|
|
||||||
already be annotated) with some sequence of `Value`s `[v_1, ..., v_m]`
|
|
||||||
(that *MUST* be non-empty), surround `r` as follows:
|
|
||||||
|
|
||||||
[0xBF] ++ len(|r|) ++ r ++ len(|«v_1»|) ++ «v_1» ++...++ len(|«v_m»|) ++ «v_m»
|
|
||||||
|
|
|
@ -86,7 +86,10 @@ when to stop expecting more contained `Repr`s.
|
||||||
<a id="varint"></a> Each length is stored as an [argdata][]-compatible
|
<a id="varint"></a> Each length is stored as an [argdata][]-compatible
|
||||||
big-endian base 128 *varint*.[^see-also-leb128] Each byte of a varint
|
big-endian base 128 *varint*.[^see-also-leb128] Each byte of a varint
|
||||||
stores seven bits of the length. All bytes have a clear upper bit,
|
stores seven bits of the length. All bytes have a clear upper bit,
|
||||||
except the final byte, which has the upper bit set.
|
except the final byte, which has the upper bit set. Implementations
|
||||||
|
*SHOULD* use the shortest encoding for a varint, and *MUST NOT* produce
|
||||||
|
an encoded varint with more than nine leading `0`
|
||||||
|
bytes.[^overlong-varint] [^nine-leading-varint-zeroes]
|
||||||
|
|
||||||
[^see-also-leb128]: Argdata's length representation is very close to
|
[^see-also-leb128]: Argdata's length representation is very close to
|
||||||
[Variable-length quantity (VLQ)][VLQ] encoding, differing only in
|
[Variable-length quantity (VLQ)][VLQ] encoding, differing only in
|
||||||
|
@ -94,16 +97,20 @@ except the final byte, which has the upper bit set.
|
||||||
big-endian, unlike [LEB128][] encoding ([as used by
|
big-endian, unlike [LEB128][] encoding ([as used by
|
||||||
Google][google-varint] in protobufs).
|
Google][google-varint] in protobufs).
|
||||||
|
|
||||||
There is no requirement that a varint-encoded length be the unique
|
[^overlong-varint]: **Implementation note.** The spec permits overlong
|
||||||
shortest encoding for the length.[^overlong-varint] However,
|
length encodings to reduce wasted activity in resource-constrained
|
||||||
implementations *SHOULD* use the shortest encoding whereever possible
|
situations. If an implementation is in anything other than a very
|
||||||
when writing, and *MAY* reject encodings with more than eight leading
|
low-level language, it is likely to be able to use
|
||||||
`0` bytes when reading encoded values.
|
[IOList](./conventions.html#iolists)-style data structures to avoid
|
||||||
|
unnecessary copying.
|
||||||
|
|
||||||
[^overlong-varint]: **Implementation note.** The spec permits overlong length encodings to
|
[^nine-leading-varint-zeroes]: Nine leading zero bytes, plus one
|
||||||
reduce wasted activity in resource-constrained situations. If an implementation is in
|
non-zero byte, equals ten bytes in total. Each byte of varint yields
|
||||||
anything other than a very low-level language, it is likely to be able to use
|
7 bits of usable length indicator, so ten bytes gives 70 bits, while
|
||||||
[IOList](./conventions.html#iolists)-style data structures to avoid unnecessary copying.
|
nine would only give 63, not quite enough for a 64-bit value. Of
|
||||||
|
course, it may be some time before an encoder legitimately needs to
|
||||||
|
use a 64-bit length indicator, let alone in a resource-constrained
|
||||||
|
situation.
|
||||||
|
|
||||||
**Records.** A `Record` is encoded as tag `0xA7` followed by the
|
**Records.** A `Record` is encoded as tag `0xA7` followed by the
|
||||||
length-prefixed encodings of its label and fields.
|
length-prefixed encodings of its label and fields.
|
||||||
|
@ -298,6 +305,10 @@ The exclusion of lengths from `Repr`s, placing lengths instead ahead of
|
||||||
contained values in sequences, is inspired by [argdata][], as is the
|
contained values in sequences, is inspired by [argdata][], as is the
|
||||||
inclusion of a `NUL` byte in `String` `Repr`s for C interoperability.
|
inclusion of a `NUL` byte in `String` `Repr`s for C interoperability.
|
||||||
|
|
||||||
|
## Appendix. Summary of syntax
|
||||||
|
|
||||||
|
{% include cheatsheet-binary.md %}
|
||||||
|
|
||||||
## Appendix. Autodetection of textual or binary syntax
|
## Appendix. Autodetection of textual or binary syntax
|
||||||
|
|
||||||
Every tag byte in a binary Preserves `Repr` falls within the range
|
Every tag byte in a binary Preserves `Repr` falls within the range
|
||||||
|
|
Loading…
Reference in New Issue