Angle bracket S-exprs for Records!
This commit is contained in:
parent
74f9093c5e
commit
0f5f0630d2
63
preserves.md
63
preserves.md
|
@ -6,7 +6,7 @@
|
||||||
# Preserves: an Expressive Data Language
|
# Preserves: an Expressive Data Language
|
||||||
|
|
||||||
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
||||||
June 2019. Version 0.0.5.
|
August 2019. Version 0.0.6.
|
||||||
|
|
||||||
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
|
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
|
||||||
[spki]: http://world.std.com/~cme/html/spki.html
|
[spki]: http://world.std.com/~cme/html/spki.html
|
||||||
|
@ -212,11 +212,10 @@ Any `Value` may be preceded by whitespace.
|
||||||
Atom = Boolean / Float / Double / SignedInteger /
|
Atom = Boolean / Float / Double / SignedInteger /
|
||||||
String / ByteString / Symbol
|
String / ByteString / Symbol
|
||||||
|
|
||||||
Each `Record` is its label-`Value` followed by a parenthesised
|
Each `Record` is an angle-bracket enclosed grouping of its
|
||||||
grouping of its field-`Value`s. Whitespace is not permitted between
|
label-`Value` followed by its field-`Value`s.
|
||||||
the label and the open-parenthesis.
|
|
||||||
|
|
||||||
Record = Value "(" *Value ws ")"
|
Record = "<" Value *Value ws ">"
|
||||||
|
|
||||||
`Sequence`s are enclosed in square brackets. `Dictionary` values are
|
`Sequence`s are enclosed in square brackets. `Dictionary` values are
|
||||||
curly-brace-enclosed colon-separated pairs of values. `Set`s are
|
curly-brace-enclosed colon-separated pairs of values. `Set`s are
|
||||||
|
@ -236,12 +235,6 @@ or more values enclosed by the tokens `#set{` and
|
||||||
commas separating, and commas terminating elements or key/value
|
commas separating, and commas terminating elements or key/value
|
||||||
pairs within a collection.
|
pairs within a collection.
|
||||||
|
|
||||||
The special cases of records with a single field, which is in turn a
|
|
||||||
sequence or dictionary, may be written omitting the parentheses.
|
|
||||||
|
|
||||||
Record =/ Value Sequence
|
|
||||||
Record =/ Value Dictionary
|
|
||||||
|
|
||||||
`Boolean`s are the simple literal strings `#true` and `#false`.
|
`Boolean`s are the simple literal strings `#true` and `#false`.
|
||||||
|
|
||||||
Boolean = %s"#true" / %s"#false"
|
Boolean = %s"#true" / %s"#false"
|
||||||
|
@ -356,7 +349,7 @@ double quote mark.
|
||||||
symstart = ALPHA / sympunct / symunicode
|
symstart = ALPHA / sympunct / symunicode
|
||||||
symcont = ALPHA / sympunct / symunicode / DIGIT / "-"
|
symcont = ALPHA / sympunct / symunicode / DIGIT / "-"
|
||||||
sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
|
sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
|
||||||
"?" / "_" / "=" / "+" / "<" / ">" / "/" / "."
|
"?" / "_" / "=" / "+" / "/" / "."
|
||||||
symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
|
symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
|
||||||
symunicode = <any code point greater than 127 whose Unicode
|
symunicode = <any code point greater than 127 whose Unicode
|
||||||
category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
|
category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
|
||||||
|
@ -541,14 +534,14 @@ zero-length chunks.
|
||||||
|
|
||||||
Format B (known length):
|
Format B (known length):
|
||||||
|
|
||||||
[[ L(F_1...F_m) ]] = header(2,0,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]]
|
[[ <L F_1...F_m> ]] = header(2,0,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]]
|
||||||
|
|
||||||
For `m` fields, `m+1` is supplied to `header`, to account for the
|
For `m` fields, `m+1` is supplied to `header`, to account for the
|
||||||
encoding of the record label.
|
encoding of the record label.
|
||||||
|
|
||||||
Format C (streaming):
|
Format C (streaming):
|
||||||
|
|
||||||
[[ L(F_1...F_m) ]] = open(2,0) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close()
|
[[ <L F_1...F_m> ]] = open(2,0) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close()
|
||||||
|
|
||||||
Applications *SHOULD* prefer the known-length format for encoding
|
Applications *SHOULD* prefer the known-length format for encoding
|
||||||
`Record`s.
|
`Record`s.
|
||||||
|
@ -569,7 +562,7 @@ be tersely encoded as
|
||||||
number 4 to the symbol `void`, making
|
number 4 to the symbol `void`, making
|
||||||
|
|
||||||
[[void]] = header(0,1,4) = [0x14]
|
[[void]] = header(0,1,4) = [0x14]
|
||||||
[[void()]] = header(2,0,1) ++ [[void]] = [0x81, 0x14]
|
[[<void>]] = header(2,0,1) ++ [[void]] = [0x81, 0x14]
|
||||||
|
|
||||||
or it may map symbol `person` to placeholder number 102, making
|
or it may map symbol `person` to placeholder number 102, making
|
||||||
|
|
||||||
|
@ -577,7 +570,7 @@ or it may map symbol `person` to placeholder number 102, making
|
||||||
|
|
||||||
and so
|
and so
|
||||||
|
|
||||||
[[person("Dr", "Elizabeth", "Blackwell")]]
|
[[<person "Dr" "Elizabeth" "Blackwell">]]
|
||||||
= header(2,0,4) ++ [[person]] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
|
= header(2,0,4) ++ [[person]] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
|
||||||
= [0x84, 0x1F, 0x66] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
|
= [0x84, 0x1F, 0x66] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
|
||||||
|
|
||||||
|
@ -714,7 +707,7 @@ The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
|
||||||
To annotate a `Repr` `r` with some `Value` `v`, prepend `r` with
|
To annotate a `Repr` `r` with some `Value` `v`, prepend `r` with
|
||||||
`[0x05] ++ [[v]]`.
|
`[0x05] ++ [[v]]`.
|
||||||
|
|
||||||
For example, the `Repr` corresponding to textual syntax `@a @b []`,
|
For example, the `Repr` corresponding to textual syntax `@a@b[]`,
|
||||||
i.e. an empty sequence annotated with two symbols, `a` and `b`, is
|
i.e. an empty sequence annotated with two symbols, `a` and `b`, is
|
||||||
|
|
||||||
[[ @a @b [] ]]
|
[[ @a @b [] ]]
|
||||||
|
@ -734,8 +727,8 @@ placeholder number 0 to symbol `discard`, 1 to `capture`, and 2 to
|
||||||
|
|
||||||
| Value | Encoded byte sequence |
|
| Value | Encoded byte sequence |
|
||||||
|---------------------------------------------------|-------------------------------------------------------------------------------------|
|
|---------------------------------------------------|-------------------------------------------------------------------------------------|
|
||||||
| `capture(discard())` | 82 11 81 10 |
|
| `<capture <discard>>` | 82 11 81 10 |
|
||||||
| `observe(speak(discard(), capture(discard())))` | 82 12 83 75 's' 'p' 'e' 'a' 'k' 81 10 82 11 81 11 |
|
| `<observe <speak <discard> <capture <discard>>>>` | 82 12 83 75 's' 'p' 'e' 'a' 'k' 81 10 82 11 81 11 |
|
||||||
| `[1 2 3 4]` (format B) | 94 31 32 33 34 |
|
| `[1 2 3 4]` (format B) | 94 31 32 33 34 |
|
||||||
| `[1 2 3 4]` (format C) | 29 31 32 33 34 04 |
|
| `[1 2 3 4]` (format C) | 29 31 32 33 34 04 |
|
||||||
| `[-2 -1 0 1]` | 94 3E 3F 30 31 |
|
| `[-2 -1 0 1]` | 94 3E 3F 30 31 |
|
||||||
|
@ -754,7 +747,7 @@ placeholder number 0 to symbol `discard`, 1 to `capture`, and 2 to
|
||||||
|
|
||||||
The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`
|
The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`
|
||||||
|
|
||||||
[titled person 2 thing 1](101, "Blackwell", date(1821 2 3), "Dr")
|
<[titled person 2 thing 1] 101 "Blackwell" <date 1821 2 3> "Dr">
|
||||||
|
|
||||||
encodes to
|
encodes to
|
||||||
|
|
||||||
|
@ -982,16 +975,16 @@ such media types following the general rules for ordering of
|
||||||
|
|
||||||
| Value | Encoded hexadecimal byte sequence |
|
| Value | Encoded hexadecimal byte sequence |
|
||||||
|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
|
|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
|
||||||
| `mime(application/octet-stream #"abcde")` | 83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
|
| `<mime application/octet-stream #"abcde">` | 83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
|
||||||
| `mime(text/plain #"ABC")` | 83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 |
|
| `<mime text/plain #"ABC">` | 83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 |
|
||||||
| `mime(application/xml #"<xhtml/>")` | 83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E |
|
| `<mime application/xml #"<xhtml/>">` | 83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E |
|
||||||
| `mime(text/csv #"123,234,345")` | 83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 |
|
| `<mime text/csv #"123,234,345">` | 83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 |
|
||||||
|
|
||||||
Applications making heavy use of `mime` records may choose to use a
|
Applications making heavy use of `mime` records may choose to use a
|
||||||
placeholder number for the symbol `mime` as well as the symbols for
|
placeholder number for the symbol `mime` as well as the symbols for
|
||||||
individual media types. For example, if placeholder number 1 were
|
individual media types. For example, if placeholder number 1 were
|
||||||
chosen for `mime`, and placeholder number 7 for `text/plain`, the
|
chosen for `mime`, and placeholder number 7 for `text/plain`, the
|
||||||
second example above, `mime(text/plain #"ABC")`, would be encoded as
|
second example above, `<mime text/plain #"ABC">`, would be encoded as
|
||||||
`83 11 17 63 41 42 43`.
|
`83 11 17 63 41 42 43`.
|
||||||
|
|
||||||
### Unicode normalization forms.
|
### Unicode normalization forms.
|
||||||
|
@ -1023,9 +1016,9 @@ A family of labels `i`*n* and `u`*n* for *n* ∈ {8,16,32,64} denote
|
||||||
*n*-bit-wide signed and unsigned range restrictions, respectively.
|
*n*-bit-wide signed and unsigned range restrictions, respectively.
|
||||||
Records with these labels *MUST* have one field, a `SignedInteger`,
|
Records with these labels *MUST* have one field, a `SignedInteger`,
|
||||||
which *MUST* fall within the appropriate range. That is, to be valid,
|
which *MUST* fall within the appropriate range. That is, to be valid,
|
||||||
- in `i8(`*x*`)`, -128 <= *x* <= 127.
|
- in `<i8 `*x*`>`, -128 <= *x* <= 127.
|
||||||
- in `u8(`*x*`)`, 0 <= *x* <= 255.
|
- in `<u8 `*x*`>`, 0 <= *x* <= 255.
|
||||||
- in `i16(`*x*`)`, -32768 <= *x* <= 32767.
|
- in `<i16 `*x*`>`, -32768 <= *x* <= 32767.
|
||||||
- etc.
|
- etc.
|
||||||
|
|
||||||
### Anonymous Tuples and Unit.
|
### Anonymous Tuples and Unit.
|
||||||
|
@ -1033,15 +1026,15 @@ which *MUST* fall within the appropriate range. That is, to be valid,
|
||||||
A `Tuple` is a `Record` with label `tuple` and zero or more fields,
|
A `Tuple` is a `Record` with label `tuple` and zero or more fields,
|
||||||
denoting an anonymous tuple of values.
|
denoting an anonymous tuple of values.
|
||||||
|
|
||||||
The 0-ary tuple, `tuple()`, denotes the empty tuple, sometimes called
|
The 0-ary tuple, `<tuple>`, denotes the empty tuple, sometimes called
|
||||||
"unit" or "void" (but *not* e.g. JavaScript's "undefined" value).
|
"unit" or "void" (but *not* e.g. JavaScript's "undefined" value).
|
||||||
|
|
||||||
### Null and Undefined.
|
### Null and Undefined.
|
||||||
|
|
||||||
Tony Hoare's
|
Tony Hoare's
|
||||||
"[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)"
|
"[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)"
|
||||||
can be represented with the 0-ary `Record` `null()`. An "undefined"
|
can be represented with the 0-ary `Record` `<null>`. An "undefined"
|
||||||
value can be represented as `undefined()`.
|
value can be represented as `<undefined>`.
|
||||||
|
|
||||||
### Dates and Times.
|
### Dates and Times.
|
||||||
|
|
||||||
|
@ -1429,14 +1422,6 @@ byte count acting as a kind of exponent underneath the sign bit.
|
||||||
- Canonicalization and early-bailout-equivalence-checking are in
|
- Canonicalization and early-bailout-equivalence-checking are in
|
||||||
tension with support for streaming values.
|
tension with support for streaming values.
|
||||||
|
|
||||||
Q. The postfix fields in the textual syntax come unannounced: "oh, and
|
|
||||||
another thing, what you just read is a label, and here are some
|
|
||||||
fields." This is a problem for interactive reading of textual syntax,
|
|
||||||
because after a complete term, it needs to see the next character to
|
|
||||||
tell whether it is an open-parenthesis or not! For this reason, I've
|
|
||||||
disallowed whitespace between a label `Value` and the open-parenthesis
|
|
||||||
of the fields. Is this reasonable??
|
|
||||||
|
|
||||||
Q. To remain compatible with JSON, portions of the text syntax have to
|
Q. To remain compatible with JSON, portions of the text syntax have to
|
||||||
remain case-insensitive (`%i"..."`). However, non-JSON extensions do
|
remain case-insensitive (`%i"..."`). However, non-JSON extensions do
|
||||||
not. There's only one (?) at the moment, the `%i"f"` in `Float`;
|
not. There's only one (?) at the moment, the `%i"f"` in `Float`;
|
||||||
|
|
Loading…
Reference in New Issue