New "blue jelly" machine-oriented binary syntax, inspired by argdata

This commit is contained in:
Tony Garnock-Jones 2022-06-10 17:33:52 +02:00
parent 4528100248
commit 7055a6467c
5 changed files with 220 additions and 204 deletions

View File

@ -14,4 +14,4 @@ defaults:
title: "Preserves" title: "Preserves"
version_date: "June 2022" version_date: "June 2022"
version: "0.6.3" version: "0.7.0"

View File

@ -6,9 +6,11 @@ title: "Preserves: Binary Syntax"
Tony Garnock-Jones <tonyg@leastfixedpoint.com> Tony Garnock-Jones <tonyg@leastfixedpoint.com>
{{ site.version_date }}. Version {{ site.version }}. {{ site.version_date }}. Version {{ site.version }}.
[varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
[LEB128]: https://en.wikipedia.org/wiki/LEB128 [LEB128]: https://en.wikipedia.org/wiki/LEB128
[argdata]: https://github.com/NuxiNL/argdata
[canonical]: canonical-binary.html [canonical]: canonical-binary.html
[google-varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
[vlq]: https://en.wikipedia.org/wiki/Variable-length_quantity
*Preserves* is a data model, with associated serialization formats. This *Preserves* is a data model, with associated serialization formats. This
document defines one of those formats: a binary syntax for `Value`s from document defines one of those formats: a binary syntax for `Value`s from
@ -24,49 +26,52 @@ For a value `v`, we write `«v»` for the `Repr` of v.
### Type and Length representation. ### Type and Length representation.
Each `Repr` starts with a tag byte, describing the kind of information Each `Repr` starts with a tag byte, describing the kind of information
represented. Depending on the tag, a length indicator, further encoded represented.
information, and/or an ending tag may follow.
tag (simple atomic data and small integers) However, inspired by [argdata][], a `Repr` does *not* describe its own
tag ++ binarydata (most integers) length. Instead, the surrounding context must supply the length of the
tag ++ length ++ binarydata (large integers, strings, symbols, and binary) `Repr`.
tag ++ repr ++ ... ++ endtag (compound data)
The unique end tag is byte value `0x84`. As a consequence, `Repr`s for `Compound` values store the lengths of
their contained values. Each contained `Value` is represented as a
length in bytes followed by its own `Repr`.
If present after a tag, the length of a following piece of binary data <a id="varint"></a> Each length is stored as an [argdata][]-compatible
is formatted as a [base 128 varint][varint].[^see-also-leb128] We big-endian base 128 *varint*.[^see-also-leb128] Each byte of a varint
write `varint(m)` for the varint-encoding of `m`. Quoting the stores seven bits of the length. All bytes have a clear upper bit,
[Google Protocol Buffers][varint] definition, except the final byte, which has the upper bit set. We write
`len(m)` for the varint-encoding of a non-negative integer `m`,
defined recursively as follows:
[^see-also-leb128]: Also known as [LEB128][] encoding, for unsigned len(m) = e(m, 128)
integers. Varints and LEB128-encoded integers differ only for where e(v, d) = [v + d] if v < 128
signed integers, which are not used in Preserves. e(v / 128, 0) ++ [(v % 128) + d] if v ≥ 128
> Each byte in a varint, except the last byte, has the most [^see-also-leb128]: Argdata's length representation is very close to
> significant bit (msb) set this indicates that there are further [Variable-length quantity (VLQ)][VLQ] encoding, differing only in
> bytes to come. The lower 7 bits of each byte are used to store the the flipped interpretation of the high bit of each byte. It is
> two's complement representation of the number in groups of 7 bits, big-endian, unlike [LEB128][] encoding ([as used by
> least significant group first. Google][google-varint] in protobufs).
The following table illustrates varint-encoding. The following table illustrates varint-encoding.
| Number, `m` | `m` in binary, grouped into 7-bit chunks | `varint(m)` bytes | | Number, `m` | `m` in binary, grouped into 7-bit chunks | `len(m)` bytes |
| ------ | ------------------- | ------------ | |-------------|-------------------------------------------|-----------------|
| 15 | `0001111` | 15 | | 15 | `0001111` | 143 |
| 300 | `0000010 0101100` | 172 2 | | 300 | `0000010 0101100` | 2 172 |
| 1000000000 | `0000011 1011100 1101011 0010100 0000000` | 128 148 235 220 3 | | 1000000000 | `0000011 1011100 1101011 0010100 0000000` | 3 92 107 20 128 |
It is an error for a varint-encoded `m` in a `Repr` to be anything It is an error for a varint-encoded `m` in a `Repr` to be anything other
other than the unique shortest encoding for that `m`. That is, a than the unique shortest encoding for that `m`. That is, a
varint-encoding of `m` *MUST NOT* end in `0` unless `m`=0. varint-encoding of `m` *MUST NOT* start with `0`.
### Records, Sequences, Sets and Dictionaries. ### Records, Sequences, Sets and Dictionaries.
«<L F_1...F_m>» = [0xB4] ++ «L» ++ «F_1» ++...++ «F_m» ++ [0x84] «<L F_1...F_m>» = [0xA7] ++ seq(«L», «F_1», ..., «F_m»)
«[X_1...X_m]» = [0xB5] ++ «X_1» ++...++ «X_m» ++ [0x84] «[X_1...X_m]» = [0xA8] ++ seq(«X_1», ..., «X_m»)
«#{E_1...E_m}» = [0xB6] ++ «E_1» ++...++ «E_m» ++ [0x84] «#{E_1...E_m}» = [0xA9] ++ seq(«E_1», ..., «E_m»)
«{K_1:V_1...K_m:V_m}» = [0xB7] ++ «K_1» ++ «V_1» ++...++ «K_m» ++ «V_m» ++ [0x84] «{K_1:V_1...K_m:V_m}» = [0xAA] ++ seq(«K_1», «V_1», ..., «K_m», «V_m»)
where seq(R_1, ... R_m) = len(R_1) ++ R_1 ++...++ len(R_m) ++ R_m
There is *no* ordering requirement on the `E_i` elements or There is *no* ordering requirement on the `E_i` elements or
`K_i`/`V_i` pairs.[^no-sorting-rationale] They may appear in any `K_i`/`V_i` pairs.[^no-sorting-rationale] They may appear in any
@ -89,7 +94,7 @@ serializing in some other implementation-defined order.
ordering for writing out set elements and dictionary key/value ordering for writing out set elements and dictionary key/value
pairs is *not* the same as the sort ordering implied by the pairs is *not* the same as the sort ordering implied by the
semantic ordering of those elements or keys. For example, the semantic ordering of those elements or keys. For example, the
`Repr` of a negative number very far from zero will start with `Repr` of a negative number very far from zero will start with a
byte that is *greater* than the byte which starts the `Repr` of byte that is *greater* than the byte which starts the `Repr` of
zero, making it sort lexicographically later by `Repr`, despite zero, making it sort lexicographically later by `Repr`, despite
being semantically *less than* zero. being semantically *less than* zero.
@ -101,39 +106,31 @@ serializing in some other implementation-defined order.
### SignedIntegers. ### SignedIntegers.
«x» when x ∈ SignedInteger = [0xB0] ++ varint(m) ++ intbytes(x) if ¬(-3≤x≤12) ∧ m>16 «x» when x ∈ SignedInteger = [0xA3] ++ intbytes(x)
([0xA0] + m - 1) ++ intbytes(x) if ¬(-3≤x≤12) ∧ m≤16
([0xA0] + x) if (-3≤x≤-1)
([0x90] + x) if ( 0≤x≤12)
where m = |intbytes(x)|
Integers in the range [-3,12] are compactly represented with tags The function `intbytes(x)` gives the big-endian two's-complement binary
between `0x90` and `0x9F` because they are so frequently used. representation of `x`, taking exactly as many whole bytes as needed to
Integers up to 16 bytes long are represented with a single-byte tag unambiguously identify the value and its sign. As a special case,
encoding the length of the integer. Larger integers are represented `intbytes(0)` is the empty byte sequence. The most-significant bit in
with an explicit varint length. Every `SignedInteger` *MUST* be the first byte in `intbytes(x)` (for `x`≠0) is the sign
represented with its shortest possible encoding. bit.[^zero-intbytes] Every `SignedInteger` *MUST* be represented with
its shortest possible encoding.
The function `intbytes(x)` gives the big-endian two's-complement For example,
binary representation of `x`, taking exactly as many whole bytes as
needed to unambiguously identify the value and its sign, and `m =
|intbytes(x)|`. The most-significant bit in the first byte in
`intbytes(x)` <!-- for `x`≠0 --> is the sign bit.[^zero-intbytes] For
example,
«87112285931760246646623899502532662132736» «87112285931760246646623899502532662132736»
= B0 12 01 00 00 00 00 00 00 00 = A3 01 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
«-257» = A1 FE FF «-3» = 9D «128» = A1 00 80 «-257» = A3 FE FF «-3» = A3 FD «128» = A3 00 80
«-256» = A1 FF 00 «-2» = 9E «255» = A1 00 FF «-256» = A3 FF 00 «-2» = A3 FE «255» = A3 00 FF
«-255» = A1 FF 01 «-1» = 9F «256» = A1 01 00 «-255» = A3 FF 01 «-1» = A3 FF «256» = A3 01 00
«-254» = A1 FF 02 «0» = 90 «32767» = A1 7F FF «-254» = A3 FF 02 «0» = A3 «32767» = A3 7F FF
«-129» = A1 FF 7F «1» = 91 «32768» = A2 00 80 00 «-129» = A3 FF 7F «1» = A3 01 «32768» = A3 00 80 00
«-128» = A0 80 «12» = 9C «65535» = A2 00 FF FF «-128» = A3 80 «12» = A3 0C «65535» = A3 00 FF FF
«-127» = A0 81 «13» = A0 0D «65536» = A2 01 00 00 «-127» = A3 81 «13» = A3 0D «65536» = A3 01 00 00
«-4» = A0 FC «127» = A0 7F «131072» = A2 02 00 00 «-4» = A3 FC «127» = A3 7F «131072» = A3 02 00 00
[^zero-intbytes]: The value 0 needs zero bytes to identify the [^zero-intbytes]: The value 0 needs zero bytes to identify the
value, so `intbytes(0)` is the empty byte string. Non-zero values value, so `intbytes(0)` is the empty byte string. Non-zero values
@ -146,19 +143,19 @@ and `Symbol`, the data following the tag is a UTF-8 encoding of the
`Value`'s code points, while for `ByteString` it is the raw data `Value`'s code points, while for `ByteString` it is the raw data
contained within the `Value` unmodified. contained within the `Value` unmodified.
«S» = [0xB1] ++ varint(|utf8(S)|) ++ utf8(S) if S ∈ String «S» = [0xA4] ++ utf8(S) if S ∈ String
[0xB2] ++ varint(|S|) ++ S if S ∈ ByteString [0xA5] ++ S if S ∈ ByteString
[0xB3] ++ varint(|utf8(S)|) ++ utf8(S) if S ∈ Symbol [0xA6] ++ utf8(S) if S ∈ Symbol
### Booleans. ### Booleans.
«#f» = [0x80] «#f» = [0xA0]
«#t» = [0x81] «#t» = [0xA1]
### Floats and Doubles. ### Floats and Doubles.
«F» when F ∈ Float = [0x82] ++ binary32(F) «F» when F ∈ Float = [0xA2] ++ binary32(F)
«D» when D ∈ Double = [0x83] ++ binary64(D) «D» when D ∈ Double = [0xA2] ++ binary64(D)
The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
8-byte IEEE 754 binary representations of `F` and `D`, respectively. 8-byte IEEE 754 binary representations of `F` and `D`, respectively.
@ -166,20 +163,25 @@ The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
### Embeddeds. ### Embeddeds.
The `Repr` of an `Embedded` is the `Repr` of a `Value` chosen to The `Repr` of an `Embedded` is the `Repr` of a `Value` chosen to
represent the denoted object, prefixed with `[0x86]`. represent the denoted object, prefixed with `[0xBF]`.
«#!V» = [0x86] ++ «V» «#!V» = [0xBF] ++ «V»
### Annotations. ### Annotations.
To annotate a `Repr` `r` with some `Value` `v`, prepend `r` with To annotate a `Repr` `r` with some sequence of `Value`s `[v_1, ...,
`[0x85] ++ «v»`. For example, the `Repr` corresponding to textual v_m]`, surround `r` as follows:
syntax `@a@b[]`, i.e. an empty sequence annotated with two symbols,
`a` and `b`, is [0xBE] ++ len(r) ++ r ++ len(v_1) ++ v_1 ++...++ len(v_m) ++ v_m
The `Repr` `r` *MUST NOT* already have annotations; that is, it must not begin with `0xBE`.
For example, the `Repr` corresponding to textual syntax `@a@b[]`, i.e.
an empty sequence annotated with two symbols, `a` and `b`, is
«@a @b []» «@a @b []»
= [0x85] ++ «a» ++ [0x85] ++ «b» ++ «[]» = [0xBE] ++ len(«[]») ++ «[]» ++ len(«a») ++ «a» ++ len(«b») ++ «b»
= [0x85, 0xB3, 0x01, 0x61, 0x85, 0xB3, 0x01, 0x62, 0xB5, 0x84] = [0xBE, 0x81, 0xA8, 0x82, 0xA6, 0x61, 0x82, 0xA6, 0x62]
## Security Considerations ## Security Considerations
@ -194,45 +196,67 @@ implementations *SHOULD* produce canonical binary encodings by
default; however, an implementation *MAY* permit two serializations of default; however, an implementation *MAY* permit two serializations of
the same `Value` to yield different binary `Repr`s. the same `Value` to yield different binary `Repr`s.
## Acknowledgements
The exclusion of lengths from `Repr`s, placing lengths instead ahead of
contained values in sequences, is inspired by [argdata][].
## Appendix. Autodetection of textual or binary syntax ## Appendix. Autodetection of textual or binary syntax
Every tag byte in a binary Preserves `Document` falls within the range Every tag byte in a binary Preserves `Repr` falls within the range
[`0x80`, `0xBF`]. These bytes, interpreted as UTF-8, are *continuation [`0x80`, `0xBF`]. These bytes, interpreted as UTF-8, are *continuation
bytes*, and will never occur as the first byte of a UTF-8 encoded code bytes*, and will never occur as the first byte of a UTF-8 encoded code
point. This means no binary-encoded document can be misinterpreted as point. This means no binary-encoded `Repr` can be misinterpreted as
valid UTF-8. valid UTF-8.
Conversely, a UTF-8 document must start with a valid codepoint, Conversely, a UTF-8 `Document` must start with a valid codepoint,
meaning in particular that it must not start with a byte in the range meaning in particular that it must not start with a byte in the range
[`0x80`, `0xBF`]. This means that no UTF-8 encoded textual-syntax [`0x80`, `0xBF`]. This means that no UTF-8 encoded textual-syntax
Preserves document can be misinterpreted as a binary-syntax document. Preserves `Document` can be misinterpreted as a binary-syntax `Repr`.
Examination of the top two bits of the first byte of a document gives Examination of the top two bits of the first byte of an encoded `Value`
its syntax: if the top two bits are `10`, it should be interpreted as gives its syntax: if the top two bits are `10`, it should be interpreted
a binary-syntax document; otherwise, it should be interpreted as text. as a binary-syntax `Repr`; otherwise, it should be interpreted as text.
**Streaming.** Autodetection is still possible when streaming an
undetermined number of `Value`s across, say, a TCP/IP connection:
- If the text syntax is to be used for the connection, simply start
writing each `Document` one after the other. Documents for `Atom`s
*MUST* be separated from their neighbours by whitespace; in general,
whitespace *SHOULD* be used to separate adjacent documents.
Specifically, whitespace separating adjacent documents *SHOULD* be
ASCII newline (10).
- If the binary syntax is to be used for the connection, start the
connection with byte `0xA8` (sequence). After the initial byte, send
each value `v` as `len(«v») ++ «v»`. A side effect of this approach
is that the entire stream, when complete, is a valid `Sequence`
`Repr`.
## Appendix. Table of tag values ## Appendix. Table of tag values
80 - False (8x) RESERVED 80-8F
81 - True (9x) RESERVED 90-9F
82 - Float
83 - Double
84 - End marker
85 - Annotation
86 - Embedded
(8x) RESERVED 87-8F
9x - Small integers 0..12,-3..-1 A0 - False
An - Medium integers, (n+1) bytes long A1 - True
B0 - Large integers, variable length A2 - Float or Double (length disambiguates)
B1 - String A3 - SignedIntegers (0 is encoded with no bytes at all)
B2 - ByteString A4 - String (no trailing NUL is added)
B3 - Symbol A5 - ByteString
A6 - Symbol
B4 - Record A7 - Record
B5 - Sequence A8 - Sequence
B6 - Set A9 - Set
B7 - Dictionary AA - Dictionary
(Ax) RESERVED AB-AF
(Bx) RESERVED B0-BD
BE - Annotations. {BE Lval val Lann0 ann0 Lann1 ann1 ...}
BF - Embedded
## Appendix. Binary SignedInteger representation ## Appendix. Binary SignedInteger representation
@ -242,15 +266,15 @@ values.
| Integer range | Bytes required | Encoding (hex) | | Integer range | Bytes required | Encoding (hex) |
| --- | --- | --- | | --- | --- | --- |
| -3 ≤ n ≤ 12 | 1 | `9X` | | 0 | 1 | `A3` |
| -2<sup>7</sup> ≤ n < 2<sup>7</sup> (i8) | 2 | `A0` `XX` | | -2<sup>7</sup> ≤ n < 2<sup>7</sup> (i8) | 2 | `A3` `XX` |
| -2<sup>15</sup> ≤ n < 2<sup>15</sup> (i16) | 3 | `A1` `XX` `XX` | | -2<sup>15</sup> ≤ n < 2<sup>15</sup> (i16) | 3 | `A3` `XX` `XX` |
| -2<sup>23</sup> ≤ n < 2<sup>23</sup> (i24) | 4 | `A2` `XX` `XX` `XX` | | -2<sup>23</sup> ≤ n < 2<sup>23</sup> (i24) | 4 | `A3` `XX` `XX` `XX` |
| -2<sup>31</sup> ≤ n < 2<sup>31</sup> (i32) | 5 | `A3` `XX` `XX` `XX` `XX` | | -2<sup>31</sup> ≤ n < 2<sup>31</sup> (i32) | 5 | `A3` `XX` `XX` `XX` `XX` |
| -2<sup>39</sup> ≤ n < 2<sup>39</sup> (i40) | 6 | `A4` `XX` `XX` `XX` `XX` `XX` | | -2<sup>39</sup> ≤ n < 2<sup>39</sup> (i40) | 6 | `A3` `XX` `XX` `XX` `XX` `XX` |
| -2<sup>47</sup> ≤ n < 2<sup>47</sup> (i48) | 7 | `A5` `XX` `XX` `XX` `XX` `XX` `XX` | | -2<sup>47</sup> ≤ n < 2<sup>47</sup> (i48) | 7 | `A3` `XX` `XX` `XX` `XX` `XX` `XX` |
| -2<sup>55</sup> ≤ n < 2<sup>55</sup> (i56) | 8 | `A6` `XX` `XX` `XX` `XX` `XX` `XX` `XX` | | -2<sup>55</sup> ≤ n < 2<sup>55</sup> (i56) | 8 | `A3` `XX` `XX` `XX` `XX` `XX` `XX` `XX` |
| -2<sup>63</sup> ≤ n < 2<sup>63</sup> (i64) | 9 | `A7` `XX` `XX` `XX` `XX` `XX` `XX` `XX` `XX` | | -2<sup>63</sup> ≤ n < 2<sup>63</sup> (i64) | 9 | `A3` `XX` `XX` `XX` `XX` `XX` `XX` `XX` `XX` |
<!-- Heading to visually offset the footnotes from the main document: --> <!-- Heading to visually offset the footnotes from the main document: -->
## Notes ## Notes

View File

@ -206,8 +206,8 @@ object, prefixed with `#!`.
Embedded = "#!" Value Embedded = "#!" Value
Finally, any `Value` may be represented by escaping from the textual Finally, any `Value` may be represented by escaping from the textual
syntax to the [machine-oriented binary syntax](preserves-binary.html) syntax to the [machine-oriented binary syntax](preserves-binary.html) by
by prefixing a `ByteString` containing the binary representation of the prefixing a `ByteString` containing the binary representation of the
`Value` with `#=`.[^rationale-switch-to-binary] `Value` with `#=`.[^rationale-switch-to-binary]
[^no-literal-binary-in-text] [^machine-value-annotations] [^no-literal-binary-in-text] [^machine-value-annotations]
@ -216,18 +216,18 @@ by prefixing a `ByteString` containing the binary representation of the
[^rationale-switch-to-binary]: **Rationale.** The textual syntax [^rationale-switch-to-binary]: **Rationale.** The textual syntax
cannot express every `Value`: specifically, it cannot express the cannot express every `Value`: specifically, it cannot express the
several million floating-point NaNs, or the two floating-point several million floating-point NaNs, or the two floating-point
Infinities. Since the machine-oriented binary format for `Value`s Infinities. Since the machine-oriented binary format for `Value`s expresses
expresses each `Value` with precision, embedding binary `Value`s each `Value` with precision, embedding binary `Value`s solves the
solves the problem. problem.
[^no-literal-binary-in-text]: Every text is ultimately physically [^no-literal-binary-in-text]: Every text is ultimately physically
stored as bytes; therefore, it might seem possible to escape to the stored as bytes; therefore, it might seem possible to escape to
raw form of binary encoding from within a piece of textual syntax. the raw binary encoding from within a
However, while bytes must be involved in any *representation* of piece of textual syntax. However, while bytes must be involved in
text, the text *itself* is logically a sequence of *code points* and any *representation* of text, the text *itself* is logically a
is not *intrinsically* a binary structure at all. It would be sequence of *code points* and is not *intrinsically* a binary
incoherent to expect to be able to access the representation of the structure at all. It would be incoherent to expect to be able to
text from within the text itself. access the representation of the text from within the text itself.
[^machine-value-annotations]: Any text-syntax annotations preceding [^machine-value-annotations]: Any text-syntax annotations preceding
the `#` are prepended to any binary-syntax annotations yielded by the `#` are prepended to any binary-syntax annotations yielded by
@ -235,11 +235,11 @@ by prefixing a `ByteString` containing the binary representation of the
## Annotations ## Annotations
When written down, a `Value` may have an associated sequence of When written down, a `Value` may have an associated
*annotations* carrying “out-of-band” contextual metadata about the sequence of *annotations* carrying “out-of-band” contextual metadata
value. Each annotation is, in turn, a `Value`, and may itself have about the value. Each annotation is, in turn, a `Value`, and may
annotations. The ordering of annotations attached to a `Value` is itself have annotations. The ordering of annotations attached to a
significant. `Value` is significant.
Value =/ ws "@" Value Value Value =/ ws "@" Value Value
@ -276,7 +276,7 @@ different.
## Security Considerations ## Security Considerations
**Whitespace.** The textual format allows arbitrary whitespace in many **Whitespace.** The text syntax allows arbitrary whitespace in many
positions. Consider optional restrictions on the amount of consecutive positions. Consider optional restrictions on the amount of consecutive
whitespace that may appear. whitespace that may appear.

View File

@ -220,21 +220,21 @@ The total ordering specified [above](#total-order) means that the following stat
<!-- TODO: Give some examples of large and small Preserves, perhaps --> <!-- TODO: Give some examples of large and small Preserves, perhaps -->
<!-- translated from various JSON blobs floating around the internet. --> <!-- translated from various JSON blobs floating around the internet. -->
| Value | Encoded byte sequence | | Value | Encoded byte sequence |
|-----------------------------|---------------------------------------------------------------------------------| |-----------------------------|------------------------------------------------------------------------------|
| `<capture <discard>>` | B4 B3 07 'c' 'a' 'p' 't' 'u' 'r' 'e' B4 B3 07 'd' 'i' 's' 'c' 'a' 'r' 'd' 84 84 | | `<capture <discard>>` | A7 88 A6 'c' 'a' 'p' 't' 'u' 'r' 'e' 8A A7 88 A6 'd' 'i' 's' 'c' 'a' 'r' 'd' |
| `[1 2 3 4]` | B5 91 92 93 94 84 | | `[1 2 3 4]` | A8 82 A3 01 82 A3 02 82 A3 03 82 A3 04 |
| `[-2 -1 0 1]` | B5 9E 9F 90 91 84 | | `[-2 -1 0 1]` | A8 82 A3 FE 82 A3 FF 81 A3 82 A3 01 |
| `"hello"` (format B) | B1 05 'h' 'e' 'l' 'l' 'o' | | `"hello"` | A4 'h' 'e' 'l' 'l' 'o' |
| `["a" b #"c" [] #{} #t #f]` | B5 B1 01 'a' B3 01 'b' B2 01 'c' B5 84 B6 84 81 80 84 | | `["a" b #"c" [] #{} #t #f]` | A8 82 A4 'a' 82 A6 'b' 82 A5 'c' 81 A8 81 A9 81 A1 81 A0 |
| `-257` | A1 FE FF | | `-257` | A3 FE FF |
| `-1` | 9F | | `-1` | A3 FF |
| `0` | 90 | | `0` | A3 |
| `1` | 91 | | `1` | A3 01 |
| `255` | A1 00 FF | | `255` | A3 00 FF |
| `1.0f` | 82 3F 80 00 00 | | `1.0f` | A2 3F 80 00 00 |
| `1.0` | 83 3F F0 00 00 00 00 00 00 | | `1.0` | A2 3F F0 00 00 00 00 00 00 |
| `-1.202e300` | 83 FE 3C B7 B7 59 BF 04 26 | | `-1.202e300` | A2 FE 3C B7 B7 59 BF 04 26 |
The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record` The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`
@ -242,24 +242,21 @@ The next example uses a non-`Symbol` label for a record.[^extensibility2] The `R
encodes to encodes to
B4 ;; Record A7 ;; Record
B5 ;; Sequence 9E A8 ;; Length 30, Sequence
B3 06 74 69 74 6C 65 64 ;; Symbol, "titled" 87 A6 74 69 74 6C 65 64 ;; Length 7, Symbol, "titled"
B3 06 70 65 72 73 6F 6E ;; Symbol, "person" 87 A6 70 65 72 73 6F 6E ;; Length 7, Symbol, "person"
92 ;; SignedInteger, "2" 82 A3 02 ;; Length 2, SignedInteger, "2"
B3 05 74 68 69 6E 67 ;; Symbol, "thing" 86 A6 74 68 69 6E 67 ;; Length 6, Symbol, "thing"
91 ;; SignedInteger, "1" 82 A3 01 ;; Length 2, SignedInteger, "1"
84 ;; End (sequence) 82 A3 65 ;; Length 2, SignedInteger, "101"
A0 65 ;; SignedInteger, "101" 8A A4 42 6C 61 63 6B 77 65 6C 6C ;; Length 10, String, "Blackwell"
B1 09 42 6C 61 63 6B 77 65 6C 6C ;; String, "Blackwell" 91 A7 ;; Length 17, Record
B4 ;; Record 85 A6 64 61 74 65 ;; Length 5, Symbol, "date"
B3 04 64 61 74 65 ;; Symbol, "date" 83 A3 07 1D ;; Length 3, SignedInteger, "1821"
A1 07 1D ;; SignedInteger, "1821" 82 A3 02 ;; Length 2, SignedInteger, "2"
92 ;; SignedInteger, "2" 82 A3 03 ;; Length 2, SignedInteger, "3"
93 ;; SignedInteger, "3" 83 A4 44 72 ;; Length 3, String, "Dr"
84 ;; End (record)
B1 02 44 72 ;; String, "Dr"
84 ;; End (record)
[^extensibility2]: It happens to line up with Racket's [^extensibility2]: It happens to line up with Racket's
representation of a record label for an inheritance hierarchy representation of a record label for an inheritance hierarchy
@ -311,27 +308,23 @@ The first RFC 8259 example:
when read using the Preserves text syntax encodes via the binary syntax when read using the Preserves text syntax encodes via the binary syntax
as follows: as follows:
B7 AA
B1 05 "Image" 86 A4 "Image"
B7 01 AC AA
B1 03 "IDs" B5 89 A4 "Animated" 86 A6 "false"
A0 74 87 A4 "Height" 83 A3 02 58
A1 03 AF 84 A4 "IDs" 91 A8
A1 00 EA 82 A3 74
A2 00 97 89 83 A3 03 AF
84 83 A3 00 EA
B1 05 "Title" B1 14 "View from 15th Floor" 84 A3 00 97 89
B1 05 "Width" A1 03 20 8A A4 "Thumbnail"
B1 06 "Height" A1 02 58 C3 AA
B1 08 "Animated" B3 05 "false" 87 A4 "Height" 82 A3 7D
B1 09 "Thumbnail" 84 A4 "Url" A7 A4 "http://www.example.com/image/481989943"
B7 86 A4 "Width" 82 A3 64
B1 03 "Url" B1 26 "http://www.example.com/image/481989943" 86 A4 "Title" 95 A4 "View from 15th Floor"
B1 05 "Width" A0 64 86 A4 "Width" 83 A3 03 20
B1 06 "Height" A0 7D
84
84
84
The second RFC 8259 example: The second RFC 8259 example:
@ -360,28 +353,25 @@ The second RFC 8259 example:
encodes to binary as follows: encodes to binary as follows:
B5 A8
B7 FE AA
B1 03 "Zip" B1 05 "94107" 88 A4 "Address" 81 A4
B1 04 "City" B1 0D "SAN FRANCISCO" 85 A4 "City" 8E A4 "SAN FRANCISCO"
B1 05 "State" B1 02 "CA" 88 A4 "Country" 83 A4 "US"
B1 07 "Address" B1 00 89 A4 "Latitude" 89 A2 40 42 E2 26 80 9D 49 52
B1 07 "Country" B1 02 "US" 8A A4 "Longitude" 89 A2 C0 5E 99 56 6C F4 1F 21
B1 08 "Latitude" 83 40 42 E2 26 80 9D 49 52 86 A4 "State" 83 A4 "CA"
B1 09 "Longitude" 83 C0 5E 99 56 6C F4 1F 21 84 A4 "Zip" 86 A4 "94107"
B1 09 "precision" B1 03 "zip" 8A A4 "precision" 84 A4 "zip"
84 FA AA
B7 88 A4 "Address" 81 A4
B1 03 "Zip" B1 05 "94085" 85 A4 "City" 8A A4 "SUNNYVALE"
B1 04 "City" B1 09 "SUNNYVALE" 88 A4 "Country" 83 A4 "US"
B1 05 "State" B1 02 "CA" 89 A4 "Latitude" 89 A2 40 42 AF 9D 66 AD B4 03
B1 07 "Address" B1 00 8A A4 "Longitude" 89 A2 C0 5E 81 AA 4F CA 42 AF
B1 07 "Country" B1 02 "US" 86 A4 "State" 83 A4 "CA"
B1 08 "Latitude" 83 40 42 AF 9D 66 AD B4 03 84 A4 "Zip" 86 A4 "94085"
B1 09 "Longitude" 83 C0 5E 81 AA 4F CA 42 AF 8A A4 "precision" 84 A4 "zip"
B1 09 "precision" B1 03 "zip"
84
84
<!-- Heading to visually offset the footnotes from the main document: --> <!-- Heading to visually offset the footnotes from the main document: -->
## Notes ## Notes

View File

@ -2,6 +2,8 @@
title: "Representing Values in Programming Languages" title: "Representing Values in Programming Languages"
--- ---
[erlang-map]: http://erlang.org/doc/reference_manual/data_types.html#map
**NOT YET READY** **NOT YET READY**
We have given a definition of `Value` and its semantics, and proposed We have given a definition of `Value` and its semantics, and proposed