New "blue jelly" machine-oriented binary syntax, inspired by argdata
This commit is contained in:
parent
4528100248
commit
7055a6467c
|
@ -14,4 +14,4 @@ defaults:
|
||||||
|
|
||||||
title: "Preserves"
|
title: "Preserves"
|
||||||
version_date: "June 2022"
|
version_date: "June 2022"
|
||||||
version: "0.6.3"
|
version: "0.7.0"
|
||||||
|
|
|
@ -6,9 +6,11 @@ title: "Preserves: Binary Syntax"
|
||||||
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
||||||
{{ site.version_date }}. Version {{ site.version }}.
|
{{ site.version_date }}. Version {{ site.version }}.
|
||||||
|
|
||||||
[varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
|
|
||||||
[LEB128]: https://en.wikipedia.org/wiki/LEB128
|
[LEB128]: https://en.wikipedia.org/wiki/LEB128
|
||||||
|
[argdata]: https://github.com/NuxiNL/argdata
|
||||||
[canonical]: canonical-binary.html
|
[canonical]: canonical-binary.html
|
||||||
|
[google-varint]: https://developers.google.com/protocol-buffers/docs/encoding#varints
|
||||||
|
[vlq]: https://en.wikipedia.org/wiki/Variable-length_quantity
|
||||||
|
|
||||||
*Preserves* is a data model, with associated serialization formats. This
|
*Preserves* is a data model, with associated serialization formats. This
|
||||||
document defines one of those formats: a binary syntax for `Value`s from
|
document defines one of those formats: a binary syntax for `Value`s from
|
||||||
|
@ -24,49 +26,52 @@ For a value `v`, we write `«v»` for the `Repr` of v.
|
||||||
### Type and Length representation.
|
### Type and Length representation.
|
||||||
|
|
||||||
Each `Repr` starts with a tag byte, describing the kind of information
|
Each `Repr` starts with a tag byte, describing the kind of information
|
||||||
represented. Depending on the tag, a length indicator, further encoded
|
represented.
|
||||||
information, and/or an ending tag may follow.
|
|
||||||
|
|
||||||
tag (simple atomic data and small integers)
|
However, inspired by [argdata][], a `Repr` does *not* describe its own
|
||||||
tag ++ binarydata (most integers)
|
length. Instead, the surrounding context must supply the length of the
|
||||||
tag ++ length ++ binarydata (large integers, strings, symbols, and binary)
|
`Repr`.
|
||||||
tag ++ repr ++ ... ++ endtag (compound data)
|
|
||||||
|
|
||||||
The unique end tag is byte value `0x84`.
|
As a consequence, `Repr`s for `Compound` values store the lengths of
|
||||||
|
their contained values. Each contained `Value` is represented as a
|
||||||
|
length in bytes followed by its own `Repr`.
|
||||||
|
|
||||||
If present after a tag, the length of a following piece of binary data
|
<a id="varint"></a> Each length is stored as an [argdata][]-compatible
|
||||||
is formatted as a [base 128 varint][varint].[^see-also-leb128] We
|
big-endian base 128 *varint*.[^see-also-leb128] Each byte of a varint
|
||||||
write `varint(m)` for the varint-encoding of `m`. Quoting the
|
stores seven bits of the length. All bytes have a clear upper bit,
|
||||||
[Google Protocol Buffers][varint] definition,
|
except the final byte, which has the upper bit set. We write
|
||||||
|
`len(m)` for the varint-encoding of a non-negative integer `m`,
|
||||||
|
defined recursively as follows:
|
||||||
|
|
||||||
[^see-also-leb128]: Also known as [LEB128][] encoding, for unsigned
|
len(m) = e(m, 128)
|
||||||
integers. Varints and LEB128-encoded integers differ only for
|
where e(v, d) = [v + d] if v < 128
|
||||||
signed integers, which are not used in Preserves.
|
e(v / 128, 0) ++ [(v % 128) + d] if v ≥ 128
|
||||||
|
|
||||||
> Each byte in a varint, except the last byte, has the most
|
[^see-also-leb128]: Argdata's length representation is very close to
|
||||||
> significant bit (msb) set – this indicates that there are further
|
[Variable-length quantity (VLQ)][VLQ] encoding, differing only in
|
||||||
> bytes to come. The lower 7 bits of each byte are used to store the
|
the flipped interpretation of the high bit of each byte. It is
|
||||||
> two's complement representation of the number in groups of 7 bits,
|
big-endian, unlike [LEB128][] encoding ([as used by
|
||||||
> least significant group first.
|
Google][google-varint] in protobufs).
|
||||||
|
|
||||||
The following table illustrates varint-encoding.
|
The following table illustrates varint-encoding.
|
||||||
|
|
||||||
| Number, `m` | `m` in binary, grouped into 7-bit chunks | `varint(m)` bytes |
|
| Number, `m` | `m` in binary, grouped into 7-bit chunks | `len(m)` bytes |
|
||||||
| ------ | ------------------- | ------------ |
|
|-------------|-------------------------------------------|-----------------|
|
||||||
| 15 | `0001111` | 15 |
|
| 15 | `0001111` | 143 |
|
||||||
| 300 | `0000010 0101100` | 172 2 |
|
| 300 | `0000010 0101100` | 2 172 |
|
||||||
| 1000000000 | `0000011 1011100 1101011 0010100 0000000` | 128 148 235 220 3 |
|
| 1000000000 | `0000011 1011100 1101011 0010100 0000000` | 3 92 107 20 128 |
|
||||||
|
|
||||||
It is an error for a varint-encoded `m` in a `Repr` to be anything
|
It is an error for a varint-encoded `m` in a `Repr` to be anything other
|
||||||
other than the unique shortest encoding for that `m`. That is, a
|
than the unique shortest encoding for that `m`. That is, a
|
||||||
varint-encoding of `m` *MUST NOT* end in `0` unless `m`=0.
|
varint-encoding of `m` *MUST NOT* start with `0`.
|
||||||
|
|
||||||
### Records, Sequences, Sets and Dictionaries.
|
### Records, Sequences, Sets and Dictionaries.
|
||||||
|
|
||||||
«<L F_1...F_m>» = [0xB4] ++ «L» ++ «F_1» ++...++ «F_m» ++ [0x84]
|
«<L F_1...F_m>» = [0xA7] ++ seq(«L», «F_1», ..., «F_m»)
|
||||||
«[X_1...X_m]» = [0xB5] ++ «X_1» ++...++ «X_m» ++ [0x84]
|
«[X_1...X_m]» = [0xA8] ++ seq(«X_1», ..., «X_m»)
|
||||||
«#{E_1...E_m}» = [0xB6] ++ «E_1» ++...++ «E_m» ++ [0x84]
|
«#{E_1...E_m}» = [0xA9] ++ seq(«E_1», ..., «E_m»)
|
||||||
«{K_1:V_1...K_m:V_m}» = [0xB7] ++ «K_1» ++ «V_1» ++...++ «K_m» ++ «V_m» ++ [0x84]
|
«{K_1:V_1...K_m:V_m}» = [0xAA] ++ seq(«K_1», «V_1», ..., «K_m», «V_m»)
|
||||||
|
where seq(R_1, ... R_m) = len(R_1) ++ R_1 ++...++ len(R_m) ++ R_m
|
||||||
|
|
||||||
There is *no* ordering requirement on the `E_i` elements or
|
There is *no* ordering requirement on the `E_i` elements or
|
||||||
`K_i`/`V_i` pairs.[^no-sorting-rationale] They may appear in any
|
`K_i`/`V_i` pairs.[^no-sorting-rationale] They may appear in any
|
||||||
|
@ -89,7 +94,7 @@ serializing in some other implementation-defined order.
|
||||||
ordering for writing out set elements and dictionary key/value
|
ordering for writing out set elements and dictionary key/value
|
||||||
pairs is *not* the same as the sort ordering implied by the
|
pairs is *not* the same as the sort ordering implied by the
|
||||||
semantic ordering of those elements or keys. For example, the
|
semantic ordering of those elements or keys. For example, the
|
||||||
`Repr` of a negative number very far from zero will start with
|
`Repr` of a negative number very far from zero will start with a
|
||||||
byte that is *greater* than the byte which starts the `Repr` of
|
byte that is *greater* than the byte which starts the `Repr` of
|
||||||
zero, making it sort lexicographically later by `Repr`, despite
|
zero, making it sort lexicographically later by `Repr`, despite
|
||||||
being semantically *less than* zero.
|
being semantically *less than* zero.
|
||||||
|
@ -101,39 +106,31 @@ serializing in some other implementation-defined order.
|
||||||
|
|
||||||
### SignedIntegers.
|
### SignedIntegers.
|
||||||
|
|
||||||
«x» when x ∈ SignedInteger = [0xB0] ++ varint(m) ++ intbytes(x) if ¬(-3≤x≤12) ∧ m>16
|
«x» when x ∈ SignedInteger = [0xA3] ++ intbytes(x)
|
||||||
([0xA0] + m - 1) ++ intbytes(x) if ¬(-3≤x≤12) ∧ m≤16
|
|
||||||
([0xA0] + x) if (-3≤x≤-1)
|
|
||||||
([0x90] + x) if ( 0≤x≤12)
|
|
||||||
where m = |intbytes(x)|
|
|
||||||
|
|
||||||
Integers in the range [-3,12] are compactly represented with tags
|
The function `intbytes(x)` gives the big-endian two's-complement binary
|
||||||
between `0x90` and `0x9F` because they are so frequently used.
|
representation of `x`, taking exactly as many whole bytes as needed to
|
||||||
Integers up to 16 bytes long are represented with a single-byte tag
|
unambiguously identify the value and its sign. As a special case,
|
||||||
encoding the length of the integer. Larger integers are represented
|
`intbytes(0)` is the empty byte sequence. The most-significant bit in
|
||||||
with an explicit varint length. Every `SignedInteger` *MUST* be
|
the first byte in `intbytes(x)` (for `x`≠0) is the sign
|
||||||
represented with its shortest possible encoding.
|
bit.[^zero-intbytes] Every `SignedInteger` *MUST* be represented with
|
||||||
|
its shortest possible encoding.
|
||||||
|
|
||||||
The function `intbytes(x)` gives the big-endian two's-complement
|
For example,
|
||||||
binary representation of `x`, taking exactly as many whole bytes as
|
|
||||||
needed to unambiguously identify the value and its sign, and `m =
|
|
||||||
|intbytes(x)|`. The most-significant bit in the first byte in
|
|
||||||
`intbytes(x)` <!-- for `x`≠0 --> is the sign bit.[^zero-intbytes] For
|
|
||||||
example,
|
|
||||||
|
|
||||||
«87112285931760246646623899502532662132736»
|
«87112285931760246646623899502532662132736»
|
||||||
= B0 12 01 00 00 00 00 00 00 00
|
= A3 01 00 00 00 00 00 00 00
|
||||||
00 00 00 00 00 00 00 00
|
00 00 00 00 00 00 00 00
|
||||||
00 00
|
00 00
|
||||||
|
|
||||||
«-257» = A1 FE FF «-3» = 9D «128» = A1 00 80
|
«-257» = A3 FE FF «-3» = A3 FD «128» = A3 00 80
|
||||||
«-256» = A1 FF 00 «-2» = 9E «255» = A1 00 FF
|
«-256» = A3 FF 00 «-2» = A3 FE «255» = A3 00 FF
|
||||||
«-255» = A1 FF 01 «-1» = 9F «256» = A1 01 00
|
«-255» = A3 FF 01 «-1» = A3 FF «256» = A3 01 00
|
||||||
«-254» = A1 FF 02 «0» = 90 «32767» = A1 7F FF
|
«-254» = A3 FF 02 «0» = A3 «32767» = A3 7F FF
|
||||||
«-129» = A1 FF 7F «1» = 91 «32768» = A2 00 80 00
|
«-129» = A3 FF 7F «1» = A3 01 «32768» = A3 00 80 00
|
||||||
«-128» = A0 80 «12» = 9C «65535» = A2 00 FF FF
|
«-128» = A3 80 «12» = A3 0C «65535» = A3 00 FF FF
|
||||||
«-127» = A0 81 «13» = A0 0D «65536» = A2 01 00 00
|
«-127» = A3 81 «13» = A3 0D «65536» = A3 01 00 00
|
||||||
«-4» = A0 FC «127» = A0 7F «131072» = A2 02 00 00
|
«-4» = A3 FC «127» = A3 7F «131072» = A3 02 00 00
|
||||||
|
|
||||||
[^zero-intbytes]: The value 0 needs zero bytes to identify the
|
[^zero-intbytes]: The value 0 needs zero bytes to identify the
|
||||||
value, so `intbytes(0)` is the empty byte string. Non-zero values
|
value, so `intbytes(0)` is the empty byte string. Non-zero values
|
||||||
|
@ -146,19 +143,19 @@ and `Symbol`, the data following the tag is a UTF-8 encoding of the
|
||||||
`Value`'s code points, while for `ByteString` it is the raw data
|
`Value`'s code points, while for `ByteString` it is the raw data
|
||||||
contained within the `Value` unmodified.
|
contained within the `Value` unmodified.
|
||||||
|
|
||||||
«S» = [0xB1] ++ varint(|utf8(S)|) ++ utf8(S) if S ∈ String
|
«S» = [0xA4] ++ utf8(S) if S ∈ String
|
||||||
[0xB2] ++ varint(|S|) ++ S if S ∈ ByteString
|
[0xA5] ++ S if S ∈ ByteString
|
||||||
[0xB3] ++ varint(|utf8(S)|) ++ utf8(S) if S ∈ Symbol
|
[0xA6] ++ utf8(S) if S ∈ Symbol
|
||||||
|
|
||||||
### Booleans.
|
### Booleans.
|
||||||
|
|
||||||
«#f» = [0x80]
|
«#f» = [0xA0]
|
||||||
«#t» = [0x81]
|
«#t» = [0xA1]
|
||||||
|
|
||||||
### Floats and Doubles.
|
### Floats and Doubles.
|
||||||
|
|
||||||
«F» when F ∈ Float = [0x82] ++ binary32(F)
|
«F» when F ∈ Float = [0xA2] ++ binary32(F)
|
||||||
«D» when D ∈ Double = [0x83] ++ binary64(D)
|
«D» when D ∈ Double = [0xA2] ++ binary64(D)
|
||||||
|
|
||||||
The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
|
The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
|
||||||
8-byte IEEE 754 binary representations of `F` and `D`, respectively.
|
8-byte IEEE 754 binary representations of `F` and `D`, respectively.
|
||||||
|
@ -166,20 +163,25 @@ The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
|
||||||
### Embeddeds.
|
### Embeddeds.
|
||||||
|
|
||||||
The `Repr` of an `Embedded` is the `Repr` of a `Value` chosen to
|
The `Repr` of an `Embedded` is the `Repr` of a `Value` chosen to
|
||||||
represent the denoted object, prefixed with `[0x86]`.
|
represent the denoted object, prefixed with `[0xBF]`.
|
||||||
|
|
||||||
«#!V» = [0x86] ++ «V»
|
«#!V» = [0xBF] ++ «V»
|
||||||
|
|
||||||
### Annotations.
|
### Annotations.
|
||||||
|
|
||||||
To annotate a `Repr` `r` with some `Value` `v`, prepend `r` with
|
To annotate a `Repr` `r` with some sequence of `Value`s `[v_1, ...,
|
||||||
`[0x85] ++ «v»`. For example, the `Repr` corresponding to textual
|
v_m]`, surround `r` as follows:
|
||||||
syntax `@a@b[]`, i.e. an empty sequence annotated with two symbols,
|
|
||||||
`a` and `b`, is
|
[0xBE] ++ len(r) ++ r ++ len(v_1) ++ v_1 ++...++ len(v_m) ++ v_m
|
||||||
|
|
||||||
|
The `Repr` `r` *MUST NOT* already have annotations; that is, it must not begin with `0xBE`.
|
||||||
|
|
||||||
|
For example, the `Repr` corresponding to textual syntax `@a@b[]`, i.e.
|
||||||
|
an empty sequence annotated with two symbols, `a` and `b`, is
|
||||||
|
|
||||||
«@a @b []»
|
«@a @b []»
|
||||||
= [0x85] ++ «a» ++ [0x85] ++ «b» ++ «[]»
|
= [0xBE] ++ len(«[]») ++ «[]» ++ len(«a») ++ «a» ++ len(«b») ++ «b»
|
||||||
= [0x85, 0xB3, 0x01, 0x61, 0x85, 0xB3, 0x01, 0x62, 0xB5, 0x84]
|
= [0xBE, 0x81, 0xA8, 0x82, 0xA6, 0x61, 0x82, 0xA6, 0x62]
|
||||||
|
|
||||||
## Security Considerations
|
## Security Considerations
|
||||||
|
|
||||||
|
@ -194,45 +196,67 @@ implementations *SHOULD* produce canonical binary encodings by
|
||||||
default; however, an implementation *MAY* permit two serializations of
|
default; however, an implementation *MAY* permit two serializations of
|
||||||
the same `Value` to yield different binary `Repr`s.
|
the same `Value` to yield different binary `Repr`s.
|
||||||
|
|
||||||
|
## Acknowledgements
|
||||||
|
|
||||||
|
The exclusion of lengths from `Repr`s, placing lengths instead ahead of
|
||||||
|
contained values in sequences, is inspired by [argdata][].
|
||||||
|
|
||||||
## Appendix. Autodetection of textual or binary syntax
|
## Appendix. Autodetection of textual or binary syntax
|
||||||
|
|
||||||
Every tag byte in a binary Preserves `Document` falls within the range
|
Every tag byte in a binary Preserves `Repr` falls within the range
|
||||||
[`0x80`, `0xBF`]. These bytes, interpreted as UTF-8, are *continuation
|
[`0x80`, `0xBF`]. These bytes, interpreted as UTF-8, are *continuation
|
||||||
bytes*, and will never occur as the first byte of a UTF-8 encoded code
|
bytes*, and will never occur as the first byte of a UTF-8 encoded code
|
||||||
point. This means no binary-encoded document can be misinterpreted as
|
point. This means no binary-encoded `Repr` can be misinterpreted as
|
||||||
valid UTF-8.
|
valid UTF-8.
|
||||||
|
|
||||||
Conversely, a UTF-8 document must start with a valid codepoint,
|
Conversely, a UTF-8 `Document` must start with a valid codepoint,
|
||||||
meaning in particular that it must not start with a byte in the range
|
meaning in particular that it must not start with a byte in the range
|
||||||
[`0x80`, `0xBF`]. This means that no UTF-8 encoded textual-syntax
|
[`0x80`, `0xBF`]. This means that no UTF-8 encoded textual-syntax
|
||||||
Preserves document can be misinterpreted as a binary-syntax document.
|
Preserves `Document` can be misinterpreted as a binary-syntax `Repr`.
|
||||||
|
|
||||||
Examination of the top two bits of the first byte of a document gives
|
Examination of the top two bits of the first byte of an encoded `Value`
|
||||||
its syntax: if the top two bits are `10`, it should be interpreted as
|
gives its syntax: if the top two bits are `10`, it should be interpreted
|
||||||
a binary-syntax document; otherwise, it should be interpreted as text.
|
as a binary-syntax `Repr`; otherwise, it should be interpreted as text.
|
||||||
|
|
||||||
|
**Streaming.** Autodetection is still possible when streaming an
|
||||||
|
undetermined number of `Value`s across, say, a TCP/IP connection:
|
||||||
|
|
||||||
|
- If the text syntax is to be used for the connection, simply start
|
||||||
|
writing each `Document` one after the other. Documents for `Atom`s
|
||||||
|
*MUST* be separated from their neighbours by whitespace; in general,
|
||||||
|
whitespace *SHOULD* be used to separate adjacent documents.
|
||||||
|
Specifically, whitespace separating adjacent documents *SHOULD* be
|
||||||
|
ASCII newline (10).
|
||||||
|
|
||||||
|
- If the binary syntax is to be used for the connection, start the
|
||||||
|
connection with byte `0xA8` (sequence). After the initial byte, send
|
||||||
|
each value `v` as `len(«v») ++ «v»`. A side effect of this approach
|
||||||
|
is that the entire stream, when complete, is a valid `Sequence`
|
||||||
|
`Repr`.
|
||||||
|
|
||||||
## Appendix. Table of tag values
|
## Appendix. Table of tag values
|
||||||
|
|
||||||
80 - False
|
(8x) RESERVED 80-8F
|
||||||
81 - True
|
(9x) RESERVED 90-9F
|
||||||
82 - Float
|
|
||||||
83 - Double
|
|
||||||
84 - End marker
|
|
||||||
85 - Annotation
|
|
||||||
86 - Embedded
|
|
||||||
(8x) RESERVED 87-8F
|
|
||||||
|
|
||||||
9x - Small integers 0..12,-3..-1
|
A0 - False
|
||||||
An - Medium integers, (n+1) bytes long
|
A1 - True
|
||||||
B0 - Large integers, variable length
|
A2 - Float or Double (length disambiguates)
|
||||||
B1 - String
|
A3 - SignedIntegers (0 is encoded with no bytes at all)
|
||||||
B2 - ByteString
|
A4 - String (no trailing NUL is added)
|
||||||
B3 - Symbol
|
A5 - ByteString
|
||||||
|
A6 - Symbol
|
||||||
|
|
||||||
B4 - Record
|
A7 - Record
|
||||||
B5 - Sequence
|
A8 - Sequence
|
||||||
B6 - Set
|
A9 - Set
|
||||||
B7 - Dictionary
|
AA - Dictionary
|
||||||
|
|
||||||
|
(Ax) RESERVED AB-AF
|
||||||
|
|
||||||
|
(Bx) RESERVED B0-BD
|
||||||
|
BE - Annotations. {BE Lval val Lann0 ann0 Lann1 ann1 ...}
|
||||||
|
BF - Embedded
|
||||||
|
|
||||||
## Appendix. Binary SignedInteger representation
|
## Appendix. Binary SignedInteger representation
|
||||||
|
|
||||||
|
@ -242,15 +266,15 @@ values.
|
||||||
|
|
||||||
| Integer range | Bytes required | Encoding (hex) |
|
| Integer range | Bytes required | Encoding (hex) |
|
||||||
| --- | --- | --- |
|
| --- | --- | --- |
|
||||||
| -3 ≤ n ≤ 12 | 1 | `9X` |
|
| 0 | 1 | `A3` |
|
||||||
| -2<sup>7</sup> ≤ n < 2<sup>7</sup> (i8) | 2 | `A0` `XX` |
|
| -2<sup>7</sup> ≤ n < 2<sup>7</sup> (i8) | 2 | `A3` `XX` |
|
||||||
| -2<sup>15</sup> ≤ n < 2<sup>15</sup> (i16) | 3 | `A1` `XX` `XX` |
|
| -2<sup>15</sup> ≤ n < 2<sup>15</sup> (i16) | 3 | `A3` `XX` `XX` |
|
||||||
| -2<sup>23</sup> ≤ n < 2<sup>23</sup> (i24) | 4 | `A2` `XX` `XX` `XX` |
|
| -2<sup>23</sup> ≤ n < 2<sup>23</sup> (i24) | 4 | `A3` `XX` `XX` `XX` |
|
||||||
| -2<sup>31</sup> ≤ n < 2<sup>31</sup> (i32) | 5 | `A3` `XX` `XX` `XX` `XX` |
|
| -2<sup>31</sup> ≤ n < 2<sup>31</sup> (i32) | 5 | `A3` `XX` `XX` `XX` `XX` |
|
||||||
| -2<sup>39</sup> ≤ n < 2<sup>39</sup> (i40) | 6 | `A4` `XX` `XX` `XX` `XX` `XX` |
|
| -2<sup>39</sup> ≤ n < 2<sup>39</sup> (i40) | 6 | `A3` `XX` `XX` `XX` `XX` `XX` |
|
||||||
| -2<sup>47</sup> ≤ n < 2<sup>47</sup> (i48) | 7 | `A5` `XX` `XX` `XX` `XX` `XX` `XX` |
|
| -2<sup>47</sup> ≤ n < 2<sup>47</sup> (i48) | 7 | `A3` `XX` `XX` `XX` `XX` `XX` `XX` |
|
||||||
| -2<sup>55</sup> ≤ n < 2<sup>55</sup> (i56) | 8 | `A6` `XX` `XX` `XX` `XX` `XX` `XX` `XX` |
|
| -2<sup>55</sup> ≤ n < 2<sup>55</sup> (i56) | 8 | `A3` `XX` `XX` `XX` `XX` `XX` `XX` `XX` |
|
||||||
| -2<sup>63</sup> ≤ n < 2<sup>63</sup> (i64) | 9 | `A7` `XX` `XX` `XX` `XX` `XX` `XX` `XX` `XX` |
|
| -2<sup>63</sup> ≤ n < 2<sup>63</sup> (i64) | 9 | `A3` `XX` `XX` `XX` `XX` `XX` `XX` `XX` `XX` |
|
||||||
|
|
||||||
<!-- Heading to visually offset the footnotes from the main document: -->
|
<!-- Heading to visually offset the footnotes from the main document: -->
|
||||||
## Notes
|
## Notes
|
||||||
|
|
|
@ -206,8 +206,8 @@ object, prefixed with `#!`.
|
||||||
Embedded = "#!" Value
|
Embedded = "#!" Value
|
||||||
|
|
||||||
Finally, any `Value` may be represented by escaping from the textual
|
Finally, any `Value` may be represented by escaping from the textual
|
||||||
syntax to the [machine-oriented binary syntax](preserves-binary.html)
|
syntax to the [machine-oriented binary syntax](preserves-binary.html) by
|
||||||
by prefixing a `ByteString` containing the binary representation of the
|
prefixing a `ByteString` containing the binary representation of the
|
||||||
`Value` with `#=`.[^rationale-switch-to-binary]
|
`Value` with `#=`.[^rationale-switch-to-binary]
|
||||||
[^no-literal-binary-in-text] [^machine-value-annotations]
|
[^no-literal-binary-in-text] [^machine-value-annotations]
|
||||||
|
|
||||||
|
@ -216,18 +216,18 @@ by prefixing a `ByteString` containing the binary representation of the
|
||||||
[^rationale-switch-to-binary]: **Rationale.** The textual syntax
|
[^rationale-switch-to-binary]: **Rationale.** The textual syntax
|
||||||
cannot express every `Value`: specifically, it cannot express the
|
cannot express every `Value`: specifically, it cannot express the
|
||||||
several million floating-point NaNs, or the two floating-point
|
several million floating-point NaNs, or the two floating-point
|
||||||
Infinities. Since the machine-oriented binary format for `Value`s
|
Infinities. Since the machine-oriented binary format for `Value`s expresses
|
||||||
expresses each `Value` with precision, embedding binary `Value`s
|
each `Value` with precision, embedding binary `Value`s solves the
|
||||||
solves the problem.
|
problem.
|
||||||
|
|
||||||
[^no-literal-binary-in-text]: Every text is ultimately physically
|
[^no-literal-binary-in-text]: Every text is ultimately physically
|
||||||
stored as bytes; therefore, it might seem possible to escape to the
|
stored as bytes; therefore, it might seem possible to escape to
|
||||||
raw form of binary encoding from within a piece of textual syntax.
|
the raw binary encoding from within a
|
||||||
However, while bytes must be involved in any *representation* of
|
piece of textual syntax. However, while bytes must be involved in
|
||||||
text, the text *itself* is logically a sequence of *code points* and
|
any *representation* of text, the text *itself* is logically a
|
||||||
is not *intrinsically* a binary structure at all. It would be
|
sequence of *code points* and is not *intrinsically* a binary
|
||||||
incoherent to expect to be able to access the representation of the
|
structure at all. It would be incoherent to expect to be able to
|
||||||
text from within the text itself.
|
access the representation of the text from within the text itself.
|
||||||
|
|
||||||
[^machine-value-annotations]: Any text-syntax annotations preceding
|
[^machine-value-annotations]: Any text-syntax annotations preceding
|
||||||
the `#` are prepended to any binary-syntax annotations yielded by
|
the `#` are prepended to any binary-syntax annotations yielded by
|
||||||
|
@ -235,11 +235,11 @@ by prefixing a `ByteString` containing the binary representation of the
|
||||||
|
|
||||||
## Annotations
|
## Annotations
|
||||||
|
|
||||||
When written down, a `Value` may have an associated sequence of
|
When written down, a `Value` may have an associated
|
||||||
*annotations* carrying “out-of-band” contextual metadata about the
|
sequence of *annotations* carrying “out-of-band” contextual metadata
|
||||||
value. Each annotation is, in turn, a `Value`, and may itself have
|
about the value. Each annotation is, in turn, a `Value`, and may
|
||||||
annotations. The ordering of annotations attached to a `Value` is
|
itself have annotations. The ordering of annotations attached to a
|
||||||
significant.
|
`Value` is significant.
|
||||||
|
|
||||||
Value =/ ws "@" Value Value
|
Value =/ ws "@" Value Value
|
||||||
|
|
||||||
|
@ -276,7 +276,7 @@ different.
|
||||||
|
|
||||||
## Security Considerations
|
## Security Considerations
|
||||||
|
|
||||||
**Whitespace.** The textual format allows arbitrary whitespace in many
|
**Whitespace.** The text syntax allows arbitrary whitespace in many
|
||||||
positions. Consider optional restrictions on the amount of consecutive
|
positions. Consider optional restrictions on the amount of consecutive
|
||||||
whitespace that may appear.
|
whitespace that may appear.
|
||||||
|
|
||||||
|
|
142
preserves.md
142
preserves.md
|
@ -220,21 +220,21 @@ The total ordering specified [above](#total-order) means that the following stat
|
||||||
<!-- TODO: Give some examples of large and small Preserves, perhaps -->
|
<!-- TODO: Give some examples of large and small Preserves, perhaps -->
|
||||||
<!-- translated from various JSON blobs floating around the internet. -->
|
<!-- translated from various JSON blobs floating around the internet. -->
|
||||||
|
|
||||||
| Value | Encoded byte sequence |
|
| Value | Encoded byte sequence |
|
||||||
|-----------------------------|---------------------------------------------------------------------------------|
|
|-----------------------------|------------------------------------------------------------------------------|
|
||||||
| `<capture <discard>>` | B4 B3 07 'c' 'a' 'p' 't' 'u' 'r' 'e' B4 B3 07 'd' 'i' 's' 'c' 'a' 'r' 'd' 84 84 |
|
| `<capture <discard>>` | A7 88 A6 'c' 'a' 'p' 't' 'u' 'r' 'e' 8A A7 88 A6 'd' 'i' 's' 'c' 'a' 'r' 'd' |
|
||||||
| `[1 2 3 4]` | B5 91 92 93 94 84 |
|
| `[1 2 3 4]` | A8 82 A3 01 82 A3 02 82 A3 03 82 A3 04 |
|
||||||
| `[-2 -1 0 1]` | B5 9E 9F 90 91 84 |
|
| `[-2 -1 0 1]` | A8 82 A3 FE 82 A3 FF 81 A3 82 A3 01 |
|
||||||
| `"hello"` (format B) | B1 05 'h' 'e' 'l' 'l' 'o' |
|
| `"hello"` | A4 'h' 'e' 'l' 'l' 'o' |
|
||||||
| `["a" b #"c" [] #{} #t #f]` | B5 B1 01 'a' B3 01 'b' B2 01 'c' B5 84 B6 84 81 80 84 |
|
| `["a" b #"c" [] #{} #t #f]` | A8 82 A4 'a' 82 A6 'b' 82 A5 'c' 81 A8 81 A9 81 A1 81 A0 |
|
||||||
| `-257` | A1 FE FF |
|
| `-257` | A3 FE FF |
|
||||||
| `-1` | 9F |
|
| `-1` | A3 FF |
|
||||||
| `0` | 90 |
|
| `0` | A3 |
|
||||||
| `1` | 91 |
|
| `1` | A3 01 |
|
||||||
| `255` | A1 00 FF |
|
| `255` | A3 00 FF |
|
||||||
| `1.0f` | 82 3F 80 00 00 |
|
| `1.0f` | A2 3F 80 00 00 |
|
||||||
| `1.0` | 83 3F F0 00 00 00 00 00 00 |
|
| `1.0` | A2 3F F0 00 00 00 00 00 00 |
|
||||||
| `-1.202e300` | 83 FE 3C B7 B7 59 BF 04 26 |
|
| `-1.202e300` | A2 FE 3C B7 B7 59 BF 04 26 |
|
||||||
|
|
||||||
The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`
|
The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`
|
||||||
|
|
||||||
|
@ -242,24 +242,21 @@ The next example uses a non-`Symbol` label for a record.[^extensibility2] The `R
|
||||||
|
|
||||||
encodes to
|
encodes to
|
||||||
|
|
||||||
B4 ;; Record
|
A7 ;; Record
|
||||||
B5 ;; Sequence
|
9E A8 ;; Length 30, Sequence
|
||||||
B3 06 74 69 74 6C 65 64 ;; Symbol, "titled"
|
87 A6 74 69 74 6C 65 64 ;; Length 7, Symbol, "titled"
|
||||||
B3 06 70 65 72 73 6F 6E ;; Symbol, "person"
|
87 A6 70 65 72 73 6F 6E ;; Length 7, Symbol, "person"
|
||||||
92 ;; SignedInteger, "2"
|
82 A3 02 ;; Length 2, SignedInteger, "2"
|
||||||
B3 05 74 68 69 6E 67 ;; Symbol, "thing"
|
86 A6 74 68 69 6E 67 ;; Length 6, Symbol, "thing"
|
||||||
91 ;; SignedInteger, "1"
|
82 A3 01 ;; Length 2, SignedInteger, "1"
|
||||||
84 ;; End (sequence)
|
82 A3 65 ;; Length 2, SignedInteger, "101"
|
||||||
A0 65 ;; SignedInteger, "101"
|
8A A4 42 6C 61 63 6B 77 65 6C 6C ;; Length 10, String, "Blackwell"
|
||||||
B1 09 42 6C 61 63 6B 77 65 6C 6C ;; String, "Blackwell"
|
91 A7 ;; Length 17, Record
|
||||||
B4 ;; Record
|
85 A6 64 61 74 65 ;; Length 5, Symbol, "date"
|
||||||
B3 04 64 61 74 65 ;; Symbol, "date"
|
83 A3 07 1D ;; Length 3, SignedInteger, "1821"
|
||||||
A1 07 1D ;; SignedInteger, "1821"
|
82 A3 02 ;; Length 2, SignedInteger, "2"
|
||||||
92 ;; SignedInteger, "2"
|
82 A3 03 ;; Length 2, SignedInteger, "3"
|
||||||
93 ;; SignedInteger, "3"
|
83 A4 44 72 ;; Length 3, String, "Dr"
|
||||||
84 ;; End (record)
|
|
||||||
B1 02 44 72 ;; String, "Dr"
|
|
||||||
84 ;; End (record)
|
|
||||||
|
|
||||||
[^extensibility2]: It happens to line up with Racket's
|
[^extensibility2]: It happens to line up with Racket's
|
||||||
representation of a record label for an inheritance hierarchy
|
representation of a record label for an inheritance hierarchy
|
||||||
|
@ -311,27 +308,23 @@ The first RFC 8259 example:
|
||||||
when read using the Preserves text syntax encodes via the binary syntax
|
when read using the Preserves text syntax encodes via the binary syntax
|
||||||
as follows:
|
as follows:
|
||||||
|
|
||||||
B7
|
AA
|
||||||
B1 05 "Image"
|
86 A4 "Image"
|
||||||
B7
|
01 AC AA
|
||||||
B1 03 "IDs" B5
|
89 A4 "Animated" 86 A6 "false"
|
||||||
A0 74
|
87 A4 "Height" 83 A3 02 58
|
||||||
A1 03 AF
|
84 A4 "IDs" 91 A8
|
||||||
A1 00 EA
|
82 A3 74
|
||||||
A2 00 97 89
|
83 A3 03 AF
|
||||||
84
|
83 A3 00 EA
|
||||||
B1 05 "Title" B1 14 "View from 15th Floor"
|
84 A3 00 97 89
|
||||||
B1 05 "Width" A1 03 20
|
8A A4 "Thumbnail"
|
||||||
B1 06 "Height" A1 02 58
|
C3 AA
|
||||||
B1 08 "Animated" B3 05 "false"
|
87 A4 "Height" 82 A3 7D
|
||||||
B1 09 "Thumbnail"
|
84 A4 "Url" A7 A4 "http://www.example.com/image/481989943"
|
||||||
B7
|
86 A4 "Width" 82 A3 64
|
||||||
B1 03 "Url" B1 26 "http://www.example.com/image/481989943"
|
86 A4 "Title" 95 A4 "View from 15th Floor"
|
||||||
B1 05 "Width" A0 64
|
86 A4 "Width" 83 A3 03 20
|
||||||
B1 06 "Height" A0 7D
|
|
||||||
84
|
|
||||||
84
|
|
||||||
84
|
|
||||||
|
|
||||||
The second RFC 8259 example:
|
The second RFC 8259 example:
|
||||||
|
|
||||||
|
@ -360,28 +353,25 @@ The second RFC 8259 example:
|
||||||
|
|
||||||
encodes to binary as follows:
|
encodes to binary as follows:
|
||||||
|
|
||||||
B5
|
A8
|
||||||
B7
|
FE AA
|
||||||
B1 03 "Zip" B1 05 "94107"
|
88 A4 "Address" 81 A4
|
||||||
B1 04 "City" B1 0D "SAN FRANCISCO"
|
85 A4 "City" 8E A4 "SAN FRANCISCO"
|
||||||
B1 05 "State" B1 02 "CA"
|
88 A4 "Country" 83 A4 "US"
|
||||||
B1 07 "Address" B1 00
|
89 A4 "Latitude" 89 A2 40 42 E2 26 80 9D 49 52
|
||||||
B1 07 "Country" B1 02 "US"
|
8A A4 "Longitude" 89 A2 C0 5E 99 56 6C F4 1F 21
|
||||||
B1 08 "Latitude" 83 40 42 E2 26 80 9D 49 52
|
86 A4 "State" 83 A4 "CA"
|
||||||
B1 09 "Longitude" 83 C0 5E 99 56 6C F4 1F 21
|
84 A4 "Zip" 86 A4 "94107"
|
||||||
B1 09 "precision" B1 03 "zip"
|
8A A4 "precision" 84 A4 "zip"
|
||||||
84
|
FA AA
|
||||||
B7
|
88 A4 "Address" 81 A4
|
||||||
B1 03 "Zip" B1 05 "94085"
|
85 A4 "City" 8A A4 "SUNNYVALE"
|
||||||
B1 04 "City" B1 09 "SUNNYVALE"
|
88 A4 "Country" 83 A4 "US"
|
||||||
B1 05 "State" B1 02 "CA"
|
89 A4 "Latitude" 89 A2 40 42 AF 9D 66 AD B4 03
|
||||||
B1 07 "Address" B1 00
|
8A A4 "Longitude" 89 A2 C0 5E 81 AA 4F CA 42 AF
|
||||||
B1 07 "Country" B1 02 "US"
|
86 A4 "State" 83 A4 "CA"
|
||||||
B1 08 "Latitude" 83 40 42 AF 9D 66 AD B4 03
|
84 A4 "Zip" 86 A4 "94085"
|
||||||
B1 09 "Longitude" 83 C0 5E 81 AA 4F CA 42 AF
|
8A A4 "precision" 84 A4 "zip"
|
||||||
B1 09 "precision" B1 03 "zip"
|
|
||||||
84
|
|
||||||
84
|
|
||||||
|
|
||||||
<!-- Heading to visually offset the footnotes from the main document: -->
|
<!-- Heading to visually offset the footnotes from the main document: -->
|
||||||
## Notes
|
## Notes
|
||||||
|
|
|
@ -2,6 +2,8 @@
|
||||||
title: "Representing Values in Programming Languages"
|
title: "Representing Values in Programming Languages"
|
||||||
---
|
---
|
||||||
|
|
||||||
|
[erlang-map]: http://erlang.org/doc/reference_manual/data_types.html#map
|
||||||
|
|
||||||
**NOT YET READY**
|
**NOT YET READY**
|
||||||
|
|
||||||
We have given a definition of `Value` and its semantics, and proposed
|
We have given a definition of `Value` and its semantics, and proposed
|
||||||
|
|
Loading…
Reference in New Issue