New "blue jelly" machine-oriented binary syntax, inspired by argdata

This commit is contained in:
Tony Garnock-Jones 2022-06-10 17:33:52 +02:00
parent 4528100248
commit 7055a6467c
5 changed files with 220 additions and 204 deletions

View File

@ -14,4 +14,4 @@ defaults:
title: "Preserves"
version_date: "June 2022"
version: "0.6.3"
version: "0.7.0"

View File

@ -6,9 +6,11 @@ title: "Preserves: Binary Syntax"
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
{{ site.version_date }}. Version {{ site.version }}.
[varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
[LEB128]: https://en.wikipedia.org/wiki/LEB128
[argdata]: https://github.com/NuxiNL/argdata
[canonical]: canonical-binary.html
[google-varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
[vlq]: https://en.wikipedia.org/wiki/Variable-length_quantity
*Preserves* is a data model, with associated serialization formats. This
document defines one of those formats: a binary syntax for `Value`s from
@ -24,49 +26,52 @@ For a value `v`, we write `«v»` for the `Repr` of v.
### Type and Length representation.
Each `Repr` starts with a tag byte, describing the kind of information
represented. Depending on the tag, a length indicator, further encoded
information, and/or an ending tag may follow.
represented.
tag (simple atomic data and small integers)
tag ++ binarydata (most integers)
tag ++ length ++ binarydata (large integers, strings, symbols, and binary)
tag ++ repr ++ ... ++ endtag (compound data)
However, inspired by [argdata][], a `Repr` does *not* describe its own
length. Instead, the surrounding context must supply the length of the
`Repr`.
The unique end tag is byte value `0x84`.
As a consequence, `Repr`s for `Compound` values store the lengths of
their contained values. Each contained `Value` is represented as a
length in bytes followed by its own `Repr`.
If present after a tag, the length of a following piece of binary data
is formatted as a [base 128 varint][varint].[^see-also-leb128] We
write `varint(m)` for the varint-encoding of `m`. Quoting the
[Google Protocol Buffers][varint] definition,
<a id="varint"></a> Each length is stored as an [argdata][]-compatible
big-endian base 128 *varint*.[^see-also-leb128] Each byte of a varint
stores seven bits of the length. All bytes have a clear upper bit,
except the final byte, which has the upper bit set. We write
`len(m)` for the varint-encoding of a non-negative integer `m`,
defined recursively as follows:
[^see-also-leb128]: Also known as [LEB128][] encoding, for unsigned
integers. Varints and LEB128-encoded integers differ only for
signed integers, which are not used in Preserves.
len(m) = e(m, 128)
where e(v, d) = [v + d] if v < 128
e(v / 128, 0) ++ [(v % 128) + d] if v ≥ 128
> Each byte in a varint, except the last byte, has the most
> significant bit (msb) set this indicates that there are further
> bytes to come. The lower 7 bits of each byte are used to store the
> two's complement representation of the number in groups of 7 bits,
> least significant group first.
[^see-also-leb128]: Argdata's length representation is very close to
[Variable-length quantity (VLQ)][VLQ] encoding, differing only in
the flipped interpretation of the high bit of each byte. It is
big-endian, unlike [LEB128][] encoding ([as used by
Google][google-varint] in protobufs).
The following table illustrates varint-encoding.
| Number, `m` | `m` in binary, grouped into 7-bit chunks | `varint(m)` bytes |
| ------ | ------------------- | ------------ |
| 15 | `0001111` | 15 |
| 300 | `0000010 0101100` | 172 2 |
| 1000000000 | `0000011 1011100 1101011 0010100 0000000` | 128 148 235 220 3 |
| Number, `m` | `m` in binary, grouped into 7-bit chunks | `len(m)` bytes |
|-------------|-------------------------------------------|-----------------|
| 15 | `0001111` | 143 |
| 300 | `0000010 0101100` | 2 172 |
| 1000000000 | `0000011 1011100 1101011 0010100 0000000` | 3 92 107 20 128 |
It is an error for a varint-encoded `m` in a `Repr` to be anything
other than the unique shortest encoding for that `m`. That is, a
varint-encoding of `m` *MUST NOT* end in `0` unless `m`=0.
It is an error for a varint-encoded `m` in a `Repr` to be anything other
than the unique shortest encoding for that `m`. That is, a
varint-encoding of `m` *MUST NOT* start with `0`.
### Records, Sequences, Sets and Dictionaries.
«<L F_1...F_m>» = [0xB4] ++ «L» ++ «F_1» ++...++ «F_m» ++ [0x84]
«[X_1...X_m]» = [0xB5] ++ «X_1» ++...++ «X_m» ++ [0x84]
«#{E_1...E_m}» = [0xB6] ++ «E_1» ++...++ «E_m» ++ [0x84]
«{K_1:V_1...K_m:V_m}» = [0xB7] ++ «K_1» ++ «V_1» ++...++ «K_m» ++ «V_m» ++ [0x84]
«<L F_1...F_m>» = [0xA7] ++ seq(«L», «F_1», ..., «F_m»)
«[X_1...X_m]» = [0xA8] ++ seq(«X_1», ..., «X_m»)
«#{E_1...E_m}» = [0xA9] ++ seq(«E_1», ..., «E_m»)
«{K_1:V_1...K_m:V_m}» = [0xAA] ++ seq(«K_1», «V_1», ..., «K_m», «V_m»)
where seq(R_1, ... R_m) = len(R_1) ++ R_1 ++...++ len(R_m) ++ R_m
There is *no* ordering requirement on the `E_i` elements or
`K_i`/`V_i` pairs.[^no-sorting-rationale] They may appear in any
@ -89,7 +94,7 @@ serializing in some other implementation-defined order.
ordering for writing out set elements and dictionary key/value
pairs is *not* the same as the sort ordering implied by the
semantic ordering of those elements or keys. For example, the
`Repr` of a negative number very far from zero will start with
`Repr` of a negative number very far from zero will start with a
byte that is *greater* than the byte which starts the `Repr` of
zero, making it sort lexicographically later by `Repr`, despite
being semantically *less than* zero.
@ -101,39 +106,31 @@ serializing in some other implementation-defined order.
### SignedIntegers.
«x» when x ∈ SignedInteger = [0xB0] ++ varint(m) ++ intbytes(x) if ¬(-3≤x≤12) ∧ m>16
([0xA0] + m - 1) ++ intbytes(x) if ¬(-3≤x≤12) ∧ m≤16
([0xA0] + x) if (-3≤x≤-1)
([0x90] + x) if ( 0≤x≤12)
where m = |intbytes(x)|
«x» when x ∈ SignedInteger = [0xA3] ++ intbytes(x)
Integers in the range [-3,12] are compactly represented with tags
between `0x90` and `0x9F` because they are so frequently used.
Integers up to 16 bytes long are represented with a single-byte tag
encoding the length of the integer. Larger integers are represented
with an explicit varint length. Every `SignedInteger` *MUST* be
represented with its shortest possible encoding.
The function `intbytes(x)` gives the big-endian two's-complement binary
representation of `x`, taking exactly as many whole bytes as needed to
unambiguously identify the value and its sign. As a special case,
`intbytes(0)` is the empty byte sequence. The most-significant bit in
the first byte in `intbytes(x)` (for `x`≠0) is the sign
bit.[^zero-intbytes] Every `SignedInteger` *MUST* be represented with
its shortest possible encoding.
The function `intbytes(x)` gives the big-endian two's-complement
binary representation of `x`, taking exactly as many whole bytes as
needed to unambiguously identify the value and its sign, and `m =
|intbytes(x)|`. The most-significant bit in the first byte in
`intbytes(x)` <!-- for `x`≠0 --> is the sign bit.[^zero-intbytes] For
example,
For example,
«87112285931760246646623899502532662132736»
= B0 12 01 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00
= A3 01 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00
«-257» = A1 FE FF «-3» = 9D «128» = A1 00 80
«-256» = A1 FF 00 «-2» = 9E «255» = A1 00 FF
«-255» = A1 FF 01 «-1» = 9F «256» = A1 01 00
«-254» = A1 FF 02 «0» = 90 «32767» = A1 7F FF
«-129» = A1 FF 7F «1» = 91 «32768» = A2 00 80 00
«-128» = A0 80 «12» = 9C «65535» = A2 00 FF FF
«-127» = A0 81 «13» = A0 0D «65536» = A2 01 00 00
«-4» = A0 FC «127» = A0 7F «131072» = A2 02 00 00
«-257» = A3 FE FF «-3» = A3 FD «128» = A3 00 80
«-256» = A3 FF 00 «-2» = A3 FE «255» = A3 00 FF
«-255» = A3 FF 01 «-1» = A3 FF «256» = A3 01 00
«-254» = A3 FF 02 «0» = A3 «32767» = A3 7F FF
«-129» = A3 FF 7F «1» = A3 01 «32768» = A3 00 80 00
«-128» = A3 80 «12» = A3 0C «65535» = A3 00 FF FF
«-127» = A3 81 «13» = A3 0D «65536» = A3 01 00 00
«-4» = A3 FC «127» = A3 7F «131072» = A3 02 00 00
[^zero-intbytes]: The value 0 needs zero bytes to identify the
value, so `intbytes(0)` is the empty byte string. Non-zero values
@ -146,19 +143,19 @@ and `Symbol`, the data following the tag is a UTF-8 encoding of the
`Value`'s code points, while for `ByteString` it is the raw data
contained within the `Value` unmodified.
«S» = [0xB1] ++ varint(|utf8(S)|) ++ utf8(S) if S ∈ String
[0xB2] ++ varint(|S|) ++ S if S ∈ ByteString
[0xB3] ++ varint(|utf8(S)|) ++ utf8(S) if S ∈ Symbol
«S» = [0xA4] ++ utf8(S) if S ∈ String
[0xA5] ++ S if S ∈ ByteString
[0xA6] ++ utf8(S) if S ∈ Symbol
### Booleans.
«#f» = [0x80]
«#t» = [0x81]
«#f» = [0xA0]
«#t» = [0xA1]
### Floats and Doubles.
«F» when F ∈ Float = [0x82] ++ binary32(F)
«D» when D ∈ Double = [0x83] ++ binary64(D)
«F» when F ∈ Float = [0xA2] ++ binary32(F)
«D» when D ∈ Double = [0xA2] ++ binary64(D)
The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
8-byte IEEE 754 binary representations of `F` and `D`, respectively.
@ -166,20 +163,25 @@ The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
### Embeddeds.
The `Repr` of an `Embedded` is the `Repr` of a `Value` chosen to
represent the denoted object, prefixed with `[0x86]`.
represent the denoted object, prefixed with `[0xBF]`.
«#!V» = [0x86] ++ «V»
«#!V» = [0xBF] ++ «V»
### Annotations.
To annotate a `Repr` `r` with some `Value` `v`, prepend `r` with
`[0x85] ++ «v»`. For example, the `Repr` corresponding to textual
syntax `@a@b[]`, i.e. an empty sequence annotated with two symbols,
`a` and `b`, is
To annotate a `Repr` `r` with some sequence of `Value`s `[v_1, ...,
v_m]`, surround `r` as follows:
[0xBE] ++ len(r) ++ r ++ len(v_1) ++ v_1 ++...++ len(v_m) ++ v_m
The `Repr` `r` *MUST NOT* already have annotations; that is, it must not begin with `0xBE`.
For example, the `Repr` corresponding to textual syntax `@a@b[]`, i.e.
an empty sequence annotated with two symbols, `a` and `b`, is
«@a @b []»
= [0x85] ++ «a» ++ [0x85] ++ «b» ++ «[]»
= [0x85, 0xB3, 0x01, 0x61, 0x85, 0xB3, 0x01, 0x62, 0xB5, 0x84]
= [0xBE] ++ len(«[]») ++ «[]» ++ len(«a») ++ «a» ++ len(«b») ++ «b»
= [0xBE, 0x81, 0xA8, 0x82, 0xA6, 0x61, 0x82, 0xA6, 0x62]
## Security Considerations
@ -194,45 +196,67 @@ implementations *SHOULD* produce canonical binary encodings by
default; however, an implementation *MAY* permit two serializations of
the same `Value` to yield different binary `Repr`s.
## Acknowledgements
The exclusion of lengths from `Repr`s, placing lengths instead ahead of
contained values in sequences, is inspired by [argdata][].
## Appendix. Autodetection of textual or binary syntax
Every tag byte in a binary Preserves `Document` falls within the range
Every tag byte in a binary Preserves `Repr` falls within the range
[`0x80`, `0xBF`]. These bytes, interpreted as UTF-8, are *continuation
bytes*, and will never occur as the first byte of a UTF-8 encoded code
point. This means no binary-encoded document can be misinterpreted as
point. This means no binary-encoded `Repr` can be misinterpreted as
valid UTF-8.
Conversely, a UTF-8 document must start with a valid codepoint,
Conversely, a UTF-8 `Document` must start with a valid codepoint,
meaning in particular that it must not start with a byte in the range
[`0x80`, `0xBF`]. This means that no UTF-8 encoded textual-syntax
Preserves document can be misinterpreted as a binary-syntax document.
Preserves `Document` can be misinterpreted as a binary-syntax `Repr`.
Examination of the top two bits of the first byte of a document gives
its syntax: if the top two bits are `10`, it should be interpreted as
a binary-syntax document; otherwise, it should be interpreted as text.
Examination of the top two bits of the first byte of an encoded `Value`
gives its syntax: if the top two bits are `10`, it should be interpreted
as a binary-syntax `Repr`; otherwise, it should be interpreted as text.
**Streaming.** Autodetection is still possible when streaming an
undetermined number of `Value`s across, say, a TCP/IP connection:
- If the text syntax is to be used for the connection, simply start
writing each `Document` one after the other. Documents for `Atom`s
*MUST* be separated from their neighbours by whitespace; in general,
whitespace *SHOULD* be used to separate adjacent documents.
Specifically, whitespace separating adjacent documents *SHOULD* be
ASCII newline (10).
- If the binary syntax is to be used for the connection, start the
connection with byte `0xA8` (sequence). After the initial byte, send
each value `v` as `len(«v») ++ «v»`. A side effect of this approach
is that the entire stream, when complete, is a valid `Sequence`
`Repr`.
## Appendix. Table of tag values
80 - False
81 - True
82 - Float
83 - Double
84 - End marker
85 - Annotation
86 - Embedded
(8x) RESERVED 87-8F
(8x) RESERVED 80-8F
(9x) RESERVED 90-9F
9x - Small integers 0..12,-3..-1
An - Medium integers, (n+1) bytes long
B0 - Large integers, variable length
B1 - String
B2 - ByteString
B3 - Symbol
A0 - False
A1 - True
A2 - Float or Double (length disambiguates)
A3 - SignedIntegers (0 is encoded with no bytes at all)
A4 - String (no trailing NUL is added)
A5 - ByteString
A6 - Symbol
B4 - Record
B5 - Sequence
B6 - Set
B7 - Dictionary
A7 - Record
A8 - Sequence
A9 - Set
AA - Dictionary
(Ax) RESERVED AB-AF
(Bx) RESERVED B0-BD
BE - Annotations. {BE Lval val Lann0 ann0 Lann1 ann1 ...}
BF - Embedded
## Appendix. Binary SignedInteger representation
@ -242,15 +266,15 @@ values.
| Integer range | Bytes required | Encoding (hex) |
| --- | --- | --- |
| -3 ≤ n ≤ 12 | 1 | `9X` |
| -2<sup>7</sup> ≤ n < 2<sup>7</sup> (i8) | 2 | `A0` `XX` |
| -2<sup>15</sup> ≤ n < 2<sup>15</sup> (i16) | 3 | `A1` `XX` `XX` |
| -2<sup>23</sup> ≤ n < 2<sup>23</sup> (i24) | 4 | `A2` `XX` `XX` `XX` |
| 0 | 1 | `A3` |
| -2<sup>7</sup> ≤ n < 2<sup>7</sup> (i8) | 2 | `A3` `XX` |
| -2<sup>15</sup> ≤ n < 2<sup>15</sup> (i16) | 3 | `A3` `XX` `XX` |
| -2<sup>23</sup> ≤ n < 2<sup>23</sup> (i24) | 4 | `A3` `XX` `XX` `XX` |
| -2<sup>31</sup> ≤ n < 2<sup>31</sup> (i32) | 5 | `A3` `XX` `XX` `XX` `XX` |
| -2<sup>39</sup> ≤ n < 2<sup>39</sup> (i40) | 6 | `A4` `XX` `XX` `XX` `XX` `XX` |
| -2<sup>47</sup> ≤ n < 2<sup>47</sup> (i48) | 7 | `A5` `XX` `XX` `XX` `XX` `XX` `XX` |
| -2<sup>55</sup> ≤ n < 2<sup>55</sup> (i56) | 8 | `A6` `XX` `XX` `XX` `XX` `XX` `XX` `XX` |
| -2<sup>63</sup> ≤ n < 2<sup>63</sup> (i64) | 9 | `A7` `XX` `XX` `XX` `XX` `XX` `XX` `XX` `XX` |
| -2<sup>39</sup> ≤ n < 2<sup>39</sup> (i40) | 6 | `A3` `XX` `XX` `XX` `XX` `XX` |
| -2<sup>47</sup> ≤ n < 2<sup>47</sup> (i48) | 7 | `A3` `XX` `XX` `XX` `XX` `XX` `XX` |
| -2<sup>55</sup> ≤ n < 2<sup>55</sup> (i56) | 8 | `A3` `XX` `XX` `XX` `XX` `XX` `XX` `XX` |
| -2<sup>63</sup> ≤ n < 2<sup>63</sup> (i64) | 9 | `A3` `XX` `XX` `XX` `XX` `XX` `XX` `XX` `XX` |
<!-- Heading to visually offset the footnotes from the main document: -->
## Notes

View File

@ -206,8 +206,8 @@ object, prefixed with `#!`.
Embedded = "#!" Value
Finally, any `Value` may be represented by escaping from the textual
syntax to the [machine-oriented binary syntax](preserves-binary.html)
by prefixing a `ByteString` containing the binary representation of the
syntax to the [machine-oriented binary syntax](preserves-binary.html) by
prefixing a `ByteString` containing the binary representation of the
`Value` with `#=`.[^rationale-switch-to-binary]
[^no-literal-binary-in-text] [^machine-value-annotations]
@ -216,18 +216,18 @@ by prefixing a `ByteString` containing the binary representation of the
[^rationale-switch-to-binary]: **Rationale.** The textual syntax
cannot express every `Value`: specifically, it cannot express the
several million floating-point NaNs, or the two floating-point
Infinities. Since the machine-oriented binary format for `Value`s
expresses each `Value` with precision, embedding binary `Value`s
solves the problem.
Infinities. Since the machine-oriented binary format for `Value`s expresses
each `Value` with precision, embedding binary `Value`s solves the
problem.
[^no-literal-binary-in-text]: Every text is ultimately physically
stored as bytes; therefore, it might seem possible to escape to the
raw form of binary encoding from within a piece of textual syntax.
However, while bytes must be involved in any *representation* of
text, the text *itself* is logically a sequence of *code points* and
is not *intrinsically* a binary structure at all. It would be
incoherent to expect to be able to access the representation of the
text from within the text itself.
stored as bytes; therefore, it might seem possible to escape to
the raw binary encoding from within a
piece of textual syntax. However, while bytes must be involved in
any *representation* of text, the text *itself* is logically a
sequence of *code points* and is not *intrinsically* a binary
structure at all. It would be incoherent to expect to be able to
access the representation of the text from within the text itself.
[^machine-value-annotations]: Any text-syntax annotations preceding
the `#` are prepended to any binary-syntax annotations yielded by
@ -235,11 +235,11 @@ by prefixing a `ByteString` containing the binary representation of the
## Annotations
When written down, a `Value` may have an associated sequence of
*annotations* carrying “out-of-band” contextual metadata about the
value. Each annotation is, in turn, a `Value`, and may itself have
annotations. The ordering of annotations attached to a `Value` is
significant.
When written down, a `Value` may have an associated
sequence of *annotations* carrying “out-of-band” contextual metadata
about the value. Each annotation is, in turn, a `Value`, and may
itself have annotations. The ordering of annotations attached to a
`Value` is significant.
Value =/ ws "@" Value Value
@ -276,7 +276,7 @@ different.
## Security Considerations
**Whitespace.** The textual format allows arbitrary whitespace in many
**Whitespace.** The text syntax allows arbitrary whitespace in many
positions. Consider optional restrictions on the amount of consecutive
whitespace that may appear.

View File

@ -220,21 +220,21 @@ The total ordering specified [above](#total-order) means that the following stat
<!-- TODO: Give some examples of large and small Preserves, perhaps -->
<!-- translated from various JSON blobs floating around the internet. -->
| Value | Encoded byte sequence |
|-----------------------------|---------------------------------------------------------------------------------|
| `<capture <discard>>` | B4 B3 07 'c' 'a' 'p' 't' 'u' 'r' 'e' B4 B3 07 'd' 'i' 's' 'c' 'a' 'r' 'd' 84 84 |
| `[1 2 3 4]` | B5 91 92 93 94 84 |
| `[-2 -1 0 1]` | B5 9E 9F 90 91 84 |
| `"hello"` (format B) | B1 05 'h' 'e' 'l' 'l' 'o' |
| `["a" b #"c" [] #{} #t #f]` | B5 B1 01 'a' B3 01 'b' B2 01 'c' B5 84 B6 84 81 80 84 |
| `-257` | A1 FE FF |
| `-1` | 9F |
| `0` | 90 |
| `1` | 91 |
| `255` | A1 00 FF |
| `1.0f` | 82 3F 80 00 00 |
| `1.0` | 83 3F F0 00 00 00 00 00 00 |
| `-1.202e300` | 83 FE 3C B7 B7 59 BF 04 26 |
| Value | Encoded byte sequence |
|-----------------------------|------------------------------------------------------------------------------|
| `<capture <discard>>` | A7 88 A6 'c' 'a' 'p' 't' 'u' 'r' 'e' 8A A7 88 A6 'd' 'i' 's' 'c' 'a' 'r' 'd' |
| `[1 2 3 4]` | A8 82 A3 01 82 A3 02 82 A3 03 82 A3 04 |
| `[-2 -1 0 1]` | A8 82 A3 FE 82 A3 FF 81 A3 82 A3 01 |
| `"hello"` | A4 'h' 'e' 'l' 'l' 'o' |
| `["a" b #"c" [] #{} #t #f]` | A8 82 A4 'a' 82 A6 'b' 82 A5 'c' 81 A8 81 A9 81 A1 81 A0 |
| `-257` | A3 FE FF |
| `-1` | A3 FF |
| `0` | A3 |
| `1` | A3 01 |
| `255` | A3 00 FF |
| `1.0f` | A2 3F 80 00 00 |
| `1.0` | A2 3F F0 00 00 00 00 00 00 |
| `-1.202e300` | A2 FE 3C B7 B7 59 BF 04 26 |
The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`
@ -242,24 +242,21 @@ The next example uses a non-`Symbol` label for a record.[^extensibility2] The `R
encodes to
B4 ;; Record
B5 ;; Sequence
B3 06 74 69 74 6C 65 64 ;; Symbol, "titled"
B3 06 70 65 72 73 6F 6E ;; Symbol, "person"
92 ;; SignedInteger, "2"
B3 05 74 68 69 6E 67 ;; Symbol, "thing"
91 ;; SignedInteger, "1"
84 ;; End (sequence)
A0 65 ;; SignedInteger, "101"
B1 09 42 6C 61 63 6B 77 65 6C 6C ;; String, "Blackwell"
B4 ;; Record
B3 04 64 61 74 65 ;; Symbol, "date"
A1 07 1D ;; SignedInteger, "1821"
92 ;; SignedInteger, "2"
93 ;; SignedInteger, "3"
84 ;; End (record)
B1 02 44 72 ;; String, "Dr"
84 ;; End (record)
A7 ;; Record
9E A8 ;; Length 30, Sequence
87 A6 74 69 74 6C 65 64 ;; Length 7, Symbol, "titled"
87 A6 70 65 72 73 6F 6E ;; Length 7, Symbol, "person"
82 A3 02 ;; Length 2, SignedInteger, "2"
86 A6 74 68 69 6E 67 ;; Length 6, Symbol, "thing"
82 A3 01 ;; Length 2, SignedInteger, "1"
82 A3 65 ;; Length 2, SignedInteger, "101"
8A A4 42 6C 61 63 6B 77 65 6C 6C ;; Length 10, String, "Blackwell"
91 A7 ;; Length 17, Record
85 A6 64 61 74 65 ;; Length 5, Symbol, "date"
83 A3 07 1D ;; Length 3, SignedInteger, "1821"
82 A3 02 ;; Length 2, SignedInteger, "2"
82 A3 03 ;; Length 2, SignedInteger, "3"
83 A4 44 72 ;; Length 3, String, "Dr"
[^extensibility2]: It happens to line up with Racket's
representation of a record label for an inheritance hierarchy
@ -311,27 +308,23 @@ The first RFC 8259 example:
when read using the Preserves text syntax encodes via the binary syntax
as follows:
B7
B1 05 "Image"
B7
B1 03 "IDs" B5
A0 74
A1 03 AF
A1 00 EA
A2 00 97 89
84
B1 05 "Title" B1 14 "View from 15th Floor"
B1 05 "Width" A1 03 20
B1 06 "Height" A1 02 58
B1 08 "Animated" B3 05 "false"
B1 09 "Thumbnail"
B7
B1 03 "Url" B1 26 "http://www.example.com/image/481989943"
B1 05 "Width" A0 64
B1 06 "Height" A0 7D
84
84
84
AA
86 A4 "Image"
01 AC AA
89 A4 "Animated" 86 A6 "false"
87 A4 "Height" 83 A3 02 58
84 A4 "IDs" 91 A8
82 A3 74
83 A3 03 AF
83 A3 00 EA
84 A3 00 97 89
8A A4 "Thumbnail"
C3 AA
87 A4 "Height" 82 A3 7D
84 A4 "Url" A7 A4 "http://www.example.com/image/481989943"
86 A4 "Width" 82 A3 64
86 A4 "Title" 95 A4 "View from 15th Floor"
86 A4 "Width" 83 A3 03 20
The second RFC 8259 example:
@ -360,28 +353,25 @@ The second RFC 8259 example:
encodes to binary as follows:
B5
B7
B1 03 "Zip" B1 05 "94107"
B1 04 "City" B1 0D "SAN FRANCISCO"
B1 05 "State" B1 02 "CA"
B1 07 "Address" B1 00
B1 07 "Country" B1 02 "US"
B1 08 "Latitude" 83 40 42 E2 26 80 9D 49 52
B1 09 "Longitude" 83 C0 5E 99 56 6C F4 1F 21
B1 09 "precision" B1 03 "zip"
84
B7
B1 03 "Zip" B1 05 "94085"
B1 04 "City" B1 09 "SUNNYVALE"
B1 05 "State" B1 02 "CA"
B1 07 "Address" B1 00
B1 07 "Country" B1 02 "US"
B1 08 "Latitude" 83 40 42 AF 9D 66 AD B4 03
B1 09 "Longitude" 83 C0 5E 81 AA 4F CA 42 AF
B1 09 "precision" B1 03 "zip"
84
84
A8
FE AA
88 A4 "Address" 81 A4
85 A4 "City" 8E A4 "SAN FRANCISCO"
88 A4 "Country" 83 A4 "US"
89 A4 "Latitude" 89 A2 40 42 E2 26 80 9D 49 52
8A A4 "Longitude" 89 A2 C0 5E 99 56 6C F4 1F 21
86 A4 "State" 83 A4 "CA"
84 A4 "Zip" 86 A4 "94107"
8A A4 "precision" 84 A4 "zip"
FA AA
88 A4 "Address" 81 A4
85 A4 "City" 8A A4 "SUNNYVALE"
88 A4 "Country" 83 A4 "US"
89 A4 "Latitude" 89 A2 40 42 AF 9D 66 AD B4 03
8A A4 "Longitude" 89 A2 C0 5E 81 AA 4F CA 42 AF
86 A4 "State" 83 A4 "CA"
84 A4 "Zip" 86 A4 "94085"
8A A4 "precision" 84 A4 "zip"
<!-- Heading to visually offset the footnotes from the main document: -->
## Notes

View File

@ -2,6 +2,8 @@
title: "Representing Values in Programming Languages"
---
[erlang-map]: http://erlang.org/doc/reference_manual/data_types.html#map
**NOT YET READY**
We have given a definition of `Value` and its semantics, and proposed