Angle bracket S-exprs for Records!

This commit is contained in:
Tony Garnock-Jones 2019-08-11 23:54:57 +01:00
parent 74f9093c5e
commit 0f5f0630d2
1 changed files with 24 additions and 39 deletions

View File

@ -6,7 +6,7 @@
# Preserves: an Expressive Data Language # Preserves: an Expressive Data Language
Tony Garnock-Jones <tonyg@leastfixedpoint.com> Tony Garnock-Jones <tonyg@leastfixedpoint.com>
June 2019. Version 0.0.5. August 2019. Version 0.0.6.
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt [sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
[spki]: http://world.std.com/~cme/html/spki.html [spki]: http://world.std.com/~cme/html/spki.html
@ -212,11 +212,10 @@ Any `Value` may be preceded by whitespace.
Atom = Boolean / Float / Double / SignedInteger / Atom = Boolean / Float / Double / SignedInteger /
String / ByteString / Symbol String / ByteString / Symbol
Each `Record` is its label-`Value` followed by a parenthesised Each `Record` is an angle-bracket enclosed grouping of its
grouping of its field-`Value`s. Whitespace is not permitted between label-`Value` followed by its field-`Value`s.
the label and the open-parenthesis.
Record = Value "(" *Value ws ")" Record = "<" Value *Value ws ">"
`Sequence`s are enclosed in square brackets. `Dictionary` values are `Sequence`s are enclosed in square brackets. `Dictionary` values are
curly-brace-enclosed colon-separated pairs of values. `Set`s are curly-brace-enclosed colon-separated pairs of values. `Set`s are
@ -236,12 +235,6 @@ or more values enclosed by the tokens `#set{` and
commas separating, and commas terminating elements or key/value commas separating, and commas terminating elements or key/value
pairs within a collection. pairs within a collection.
The special cases of records with a single field, which is in turn a
sequence or dictionary, may be written omitting the parentheses.
Record =/ Value Sequence
Record =/ Value Dictionary
`Boolean`s are the simple literal strings `#true` and `#false`. `Boolean`s are the simple literal strings `#true` and `#false`.
Boolean = %s"#true" / %s"#false" Boolean = %s"#true" / %s"#false"
@ -356,7 +349,7 @@ double quote mark.
symstart = ALPHA / sympunct / symunicode symstart = ALPHA / sympunct / symunicode
symcont = ALPHA / sympunct / symunicode / DIGIT / "-" symcont = ALPHA / sympunct / symunicode / DIGIT / "-"
sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" / sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
"?" / "_" / "=" / "+" / "<" / ">" / "/" / "." "?" / "_" / "=" / "+" / "/" / "."
symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG) symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
symunicode = <any code point greater than 127 whose Unicode symunicode = <any code point greater than 127 whose Unicode
category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
@ -541,14 +534,14 @@ zero-length chunks.
Format B (known length): Format B (known length):
[[ L(F_1...F_m) ]] = header(2,0,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] [[ <L F_1...F_m> ]] = header(2,0,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]]
For `m` fields, `m+1` is supplied to `header`, to account for the For `m` fields, `m+1` is supplied to `header`, to account for the
encoding of the record label. encoding of the record label.
Format C (streaming): Format C (streaming):
[[ L(F_1...F_m) ]] = open(2,0) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close() [[ <L F_1...F_m> ]] = open(2,0) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close()
Applications *SHOULD* prefer the known-length format for encoding Applications *SHOULD* prefer the known-length format for encoding
`Record`s. `Record`s.
@ -569,7 +562,7 @@ be tersely encoded as
number 4 to the symbol `void`, making number 4 to the symbol `void`, making
[[void]] = header(0,1,4) = [0x14] [[void]] = header(0,1,4) = [0x14]
[[void()]] = header(2,0,1) ++ [[void]] = [0x81, 0x14] [[<void>]] = header(2,0,1) ++ [[void]] = [0x81, 0x14]
or it may map symbol `person` to placeholder number 102, making or it may map symbol `person` to placeholder number 102, making
@ -577,7 +570,7 @@ or it may map symbol `person` to placeholder number 102, making
and so and so
[[person("Dr", "Elizabeth", "Blackwell")]] [[<person "Dr" "Elizabeth" "Blackwell">]]
= header(2,0,4) ++ [[person]] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]] = header(2,0,4) ++ [[person]] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
= [0x84, 0x1F, 0x66] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]] = [0x84, 0x1F, 0x66] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
@ -714,7 +707,7 @@ The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
To annotate a `Repr` `r` with some `Value` `v`, prepend `r` with To annotate a `Repr` `r` with some `Value` `v`, prepend `r` with
`[0x05] ++ [[v]]`. `[0x05] ++ [[v]]`.
For example, the `Repr` corresponding to textual syntax `@a @b []`, For example, the `Repr` corresponding to textual syntax `@a@b[]`,
i.e. an empty sequence annotated with two symbols, `a` and `b`, is i.e. an empty sequence annotated with two symbols, `a` and `b`, is
[[ @a @b [] ]] [[ @a @b [] ]]
@ -734,8 +727,8 @@ placeholder number 0 to symbol `discard`, 1 to `capture`, and 2 to
| Value | Encoded byte sequence | | Value | Encoded byte sequence |
|---------------------------------------------------|-------------------------------------------------------------------------------------| |---------------------------------------------------|-------------------------------------------------------------------------------------|
| `capture(discard())` | 82 11 81 10 | | `<capture <discard>>` | 82 11 81 10 |
| `observe(speak(discard(), capture(discard())))` | 82 12 83 75 's' 'p' 'e' 'a' 'k' 81 10 82 11 81 11 | | `<observe <speak <discard> <capture <discard>>>>` | 82 12 83 75 's' 'p' 'e' 'a' 'k' 81 10 82 11 81 11 |
| `[1 2 3 4]` (format B) | 94 31 32 33 34 | | `[1 2 3 4]` (format B) | 94 31 32 33 34 |
| `[1 2 3 4]` (format C) | 29 31 32 33 34 04 | | `[1 2 3 4]` (format C) | 29 31 32 33 34 04 |
| `[-2 -1 0 1]` | 94 3E 3F 30 31 | | `[-2 -1 0 1]` | 94 3E 3F 30 31 |
@ -754,7 +747,7 @@ placeholder number 0 to symbol `discard`, 1 to `capture`, and 2 to
The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record` The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`
[titled person 2 thing 1](101, "Blackwell", date(1821 2 3), "Dr") <[titled person 2 thing 1] 101 "Blackwell" <date 1821 2 3> "Dr">
encodes to encodes to
@ -982,16 +975,16 @@ such media types following the general rules for ordering of
| Value | Encoded hexadecimal byte sequence | | Value | Encoded hexadecimal byte sequence |
|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------| |--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
| `mime(application/octet-stream #"abcde")` | 83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 | | `<mime application/octet-stream #"abcde">` | 83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
| `mime(text/plain #"ABC")` | 83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 | | `<mime text/plain #"ABC">` | 83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 |
| `mime(application/xml #"<xhtml/>")` | 83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E | | `<mime application/xml #"<xhtml/>">` | 83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E |
| `mime(text/csv #"123,234,345")` | 83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 | | `<mime text/csv #"123,234,345">` | 83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 |
Applications making heavy use of `mime` records may choose to use a Applications making heavy use of `mime` records may choose to use a
placeholder number for the symbol `mime` as well as the symbols for placeholder number for the symbol `mime` as well as the symbols for
individual media types. For example, if placeholder number 1 were individual media types. For example, if placeholder number 1 were
chosen for `mime`, and placeholder number 7 for `text/plain`, the chosen for `mime`, and placeholder number 7 for `text/plain`, the
second example above, `mime(text/plain #"ABC")`, would be encoded as second example above, `<mime text/plain #"ABC">`, would be encoded as
`83 11 17 63 41 42 43`. `83 11 17 63 41 42 43`.
### Unicode normalization forms. ### Unicode normalization forms.
@ -1023,9 +1016,9 @@ A family of labels `i`*n* and `u`*n* for *n* ∈ {8,16,32,64} denote
*n*-bit-wide signed and unsigned range restrictions, respectively. *n*-bit-wide signed and unsigned range restrictions, respectively.
Records with these labels *MUST* have one field, a `SignedInteger`, Records with these labels *MUST* have one field, a `SignedInteger`,
which *MUST* fall within the appropriate range. That is, to be valid, which *MUST* fall within the appropriate range. That is, to be valid,
- in `i8(`*x*`)`, -128 <= *x* <= 127. - in `<i8 `*x*`>`, -128 <= *x* <= 127.
- in `u8(`*x*`)`, 0 <= *x* <= 255. - in `<u8 `*x*`>`, 0 <= *x* <= 255.
- in `i16(`*x*`)`, -32768 <= *x* <= 32767. - in `<i16 `*x*`>`, -32768 <= *x* <= 32767.
- etc. - etc.
### Anonymous Tuples and Unit. ### Anonymous Tuples and Unit.
@ -1033,15 +1026,15 @@ which *MUST* fall within the appropriate range. That is, to be valid,
A `Tuple` is a `Record` with label `tuple` and zero or more fields, A `Tuple` is a `Record` with label `tuple` and zero or more fields,
denoting an anonymous tuple of values. denoting an anonymous tuple of values.
The 0-ary tuple, `tuple()`, denotes the empty tuple, sometimes called The 0-ary tuple, `<tuple>`, denotes the empty tuple, sometimes called
"unit" or "void" (but *not* e.g. JavaScript's "undefined" value). "unit" or "void" (but *not* e.g. JavaScript's "undefined" value).
### Null and Undefined. ### Null and Undefined.
Tony Hoare's Tony Hoare's
"[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)" "[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)"
can be represented with the 0-ary `Record` `null()`. An "undefined" can be represented with the 0-ary `Record` `<null>`. An "undefined"
value can be represented as `undefined()`. value can be represented as `<undefined>`.
### Dates and Times. ### Dates and Times.
@ -1429,14 +1422,6 @@ byte count acting as a kind of exponent underneath the sign bit.
- Canonicalization and early-bailout-equivalence-checking are in - Canonicalization and early-bailout-equivalence-checking are in
tension with support for streaming values. tension with support for streaming values.
Q. The postfix fields in the textual syntax come unannounced: "oh, and
another thing, what you just read is a label, and here are some
fields." This is a problem for interactive reading of textual syntax,
because after a complete term, it needs to see the next character to
tell whether it is an open-parenthesis or not! For this reason, I've
disallowed whitespace between a label `Value` and the open-parenthesis
of the fields. Is this reasonable??
Q. To remain compatible with JSON, portions of the text syntax have to Q. To remain compatible with JSON, portions of the text syntax have to
remain case-insensitive (`%i"..."`). However, non-JSON extensions do remain case-insensitive (`%i"..."`). However, non-JSON extensions do
not. There's only one (?) at the moment, the `%i"f"` in `Float`; not. There's only one (?) at the moment, the `%i"f"` in `Float`;