Angle bracket S-exprs for Records!

This commit is contained in:
Tony Garnock-Jones 2019-08-11 23:54:57 +01:00
parent 74f9093c5e
commit 0f5f0630d2
1 changed files with 24 additions and 39 deletions

View File

@ -6,7 +6,7 @@
# Preserves: an Expressive Data Language
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
June 2019. Version 0.0.5.
August 2019. Version 0.0.6.
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
[spki]: http://world.std.com/~cme/html/spki.html
@ -212,11 +212,10 @@ Any `Value` may be preceded by whitespace.
Atom = Boolean / Float / Double / SignedInteger /
String / ByteString / Symbol
Each `Record` is its label-`Value` followed by a parenthesised
grouping of its field-`Value`s. Whitespace is not permitted between
the label and the open-parenthesis.
Each `Record` is an angle-bracket enclosed grouping of its
label-`Value` followed by its field-`Value`s.
Record = Value "(" *Value ws ")"
Record = "<" Value *Value ws ">"
`Sequence`s are enclosed in square brackets. `Dictionary` values are
curly-brace-enclosed colon-separated pairs of values. `Set`s are
@ -236,12 +235,6 @@ or more values enclosed by the tokens `#set{` and
commas separating, and commas terminating elements or key/value
pairs within a collection.
The special cases of records with a single field, which is in turn a
sequence or dictionary, may be written omitting the parentheses.
Record =/ Value Sequence
Record =/ Value Dictionary
`Boolean`s are the simple literal strings `#true` and `#false`.
Boolean = %s"#true" / %s"#false"
@ -356,7 +349,7 @@ double quote mark.
symstart = ALPHA / sympunct / symunicode
symcont = ALPHA / sympunct / symunicode / DIGIT / "-"
sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
"?" / "_" / "=" / "+" / "<" / ">" / "/" / "."
"?" / "_" / "=" / "+" / "/" / "."
symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
symunicode = <any code point greater than 127 whose Unicode
category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
@ -541,14 +534,14 @@ zero-length chunks.
Format B (known length):
[[ L(F_1...F_m) ]] = header(2,0,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]]
[[ <L F_1...F_m> ]] = header(2,0,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]]
For `m` fields, `m+1` is supplied to `header`, to account for the
encoding of the record label.
Format C (streaming):
[[ L(F_1...F_m) ]] = open(2,0) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close()
[[ <L F_1...F_m> ]] = open(2,0) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close()
Applications *SHOULD* prefer the known-length format for encoding
`Record`s.
@ -569,7 +562,7 @@ be tersely encoded as
number 4 to the symbol `void`, making
[[void]] = header(0,1,4) = [0x14]
[[void()]] = header(2,0,1) ++ [[void]] = [0x81, 0x14]
[[<void>]] = header(2,0,1) ++ [[void]] = [0x81, 0x14]
or it may map symbol `person` to placeholder number 102, making
@ -577,7 +570,7 @@ or it may map symbol `person` to placeholder number 102, making
and so
[[person("Dr", "Elizabeth", "Blackwell")]]
[[<person "Dr" "Elizabeth" "Blackwell">]]
= header(2,0,4) ++ [[person]] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
= [0x84, 0x1F, 0x66] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
@ -714,7 +707,7 @@ The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
To annotate a `Repr` `r` with some `Value` `v`, prepend `r` with
`[0x05] ++ [[v]]`.
For example, the `Repr` corresponding to textual syntax `@a @b []`,
For example, the `Repr` corresponding to textual syntax `@a@b[]`,
i.e. an empty sequence annotated with two symbols, `a` and `b`, is
[[ @a @b [] ]]
@ -734,8 +727,8 @@ placeholder number 0 to symbol `discard`, 1 to `capture`, and 2 to
| Value | Encoded byte sequence |
|---------------------------------------------------|-------------------------------------------------------------------------------------|
| `capture(discard())` | 82 11 81 10 |
| `observe(speak(discard(), capture(discard())))` | 82 12 83 75 's' 'p' 'e' 'a' 'k' 81 10 82 11 81 11 |
| `<capture <discard>>` | 82 11 81 10 |
| `<observe <speak <discard> <capture <discard>>>>` | 82 12 83 75 's' 'p' 'e' 'a' 'k' 81 10 82 11 81 11 |
| `[1 2 3 4]` (format B) | 94 31 32 33 34 |
| `[1 2 3 4]` (format C) | 29 31 32 33 34 04 |
| `[-2 -1 0 1]` | 94 3E 3F 30 31 |
@ -754,7 +747,7 @@ placeholder number 0 to symbol `discard`, 1 to `capture`, and 2 to
The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`
[titled person 2 thing 1](101, "Blackwell", date(1821 2 3), "Dr")
<[titled person 2 thing 1] 101 "Blackwell" <date 1821 2 3> "Dr">
encodes to
@ -982,16 +975,16 @@ such media types following the general rules for ordering of
| Value | Encoded hexadecimal byte sequence |
|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
| `mime(application/octet-stream #"abcde")` | 83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
| `mime(text/plain #"ABC")` | 83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 |
| `mime(application/xml #"<xhtml/>")` | 83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E |
| `mime(text/csv #"123,234,345")` | 83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 |
| `<mime application/octet-stream #"abcde">` | 83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
| `<mime text/plain #"ABC">` | 83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 |
| `<mime application/xml #"<xhtml/>">` | 83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E |
| `<mime text/csv #"123,234,345">` | 83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 |
Applications making heavy use of `mime` records may choose to use a
placeholder number for the symbol `mime` as well as the symbols for
individual media types. For example, if placeholder number 1 were
chosen for `mime`, and placeholder number 7 for `text/plain`, the
second example above, `mime(text/plain #"ABC")`, would be encoded as
second example above, `<mime text/plain #"ABC">`, would be encoded as
`83 11 17 63 41 42 43`.
### Unicode normalization forms.
@ -1023,9 +1016,9 @@ A family of labels `i`*n* and `u`*n* for *n* ∈ {8,16,32,64} denote
*n*-bit-wide signed and unsigned range restrictions, respectively.
Records with these labels *MUST* have one field, a `SignedInteger`,
which *MUST* fall within the appropriate range. That is, to be valid,
- in `i8(`*x*`)`, -128 <= *x* <= 127.
- in `u8(`*x*`)`, 0 <= *x* <= 255.
- in `i16(`*x*`)`, -32768 <= *x* <= 32767.
- in `<i8 `*x*`>`, -128 <= *x* <= 127.
- in `<u8 `*x*`>`, 0 <= *x* <= 255.
- in `<i16 `*x*`>`, -32768 <= *x* <= 32767.
- etc.
### Anonymous Tuples and Unit.
@ -1033,15 +1026,15 @@ which *MUST* fall within the appropriate range. That is, to be valid,
A `Tuple` is a `Record` with label `tuple` and zero or more fields,
denoting an anonymous tuple of values.
The 0-ary tuple, `tuple()`, denotes the empty tuple, sometimes called
The 0-ary tuple, `<tuple>`, denotes the empty tuple, sometimes called
"unit" or "void" (but *not* e.g. JavaScript's "undefined" value).
### Null and Undefined.
Tony Hoare's
"[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)"
can be represented with the 0-ary `Record` `null()`. An "undefined"
value can be represented as `undefined()`.
can be represented with the 0-ary `Record` `<null>`. An "undefined"
value can be represented as `<undefined>`.
### Dates and Times.
@ -1429,14 +1422,6 @@ byte count acting as a kind of exponent underneath the sign bit.
- Canonicalization and early-bailout-equivalence-checking are in
tension with support for streaming values.
Q. The postfix fields in the textual syntax come unannounced: "oh, and
another thing, what you just read is a label, and here are some
fields." This is a problem for interactive reading of textual syntax,
because after a complete term, it needs to see the next character to
tell whether it is an open-parenthesis or not! For this reason, I've
disallowed whitespace between a label `Value` and the open-parenthesis
of the fields. Is this reasonable??
Q. To remain compatible with JSON, portions of the text syntax have to
remain case-insensitive (`%i"..."`). However, non-JSON extensions do
not. There's only one (?) at the moment, the `%i"f"` in `Float`;