Angle bracket S-exprs for Records!
This commit is contained in:
parent
74f9093c5e
commit
0f5f0630d2
63
preserves.md
63
preserves.md
|
@ -6,7 +6,7 @@
|
|||
# Preserves: an Expressive Data Language
|
||||
|
||||
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
||||
June 2019. Version 0.0.5.
|
||||
August 2019. Version 0.0.6.
|
||||
|
||||
[sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
|
||||
[spki]: http://world.std.com/~cme/html/spki.html
|
||||
|
@ -212,11 +212,10 @@ Any `Value` may be preceded by whitespace.
|
|||
Atom = Boolean / Float / Double / SignedInteger /
|
||||
String / ByteString / Symbol
|
||||
|
||||
Each `Record` is its label-`Value` followed by a parenthesised
|
||||
grouping of its field-`Value`s. Whitespace is not permitted between
|
||||
the label and the open-parenthesis.
|
||||
Each `Record` is an angle-bracket enclosed grouping of its
|
||||
label-`Value` followed by its field-`Value`s.
|
||||
|
||||
Record = Value "(" *Value ws ")"
|
||||
Record = "<" Value *Value ws ">"
|
||||
|
||||
`Sequence`s are enclosed in square brackets. `Dictionary` values are
|
||||
curly-brace-enclosed colon-separated pairs of values. `Set`s are
|
||||
|
@ -236,12 +235,6 @@ or more values enclosed by the tokens `#set{` and
|
|||
commas separating, and commas terminating elements or key/value
|
||||
pairs within a collection.
|
||||
|
||||
The special cases of records with a single field, which is in turn a
|
||||
sequence or dictionary, may be written omitting the parentheses.
|
||||
|
||||
Record =/ Value Sequence
|
||||
Record =/ Value Dictionary
|
||||
|
||||
`Boolean`s are the simple literal strings `#true` and `#false`.
|
||||
|
||||
Boolean = %s"#true" / %s"#false"
|
||||
|
@ -356,7 +349,7 @@ double quote mark.
|
|||
symstart = ALPHA / sympunct / symunicode
|
||||
symcont = ALPHA / sympunct / symunicode / DIGIT / "-"
|
||||
sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
|
||||
"?" / "_" / "=" / "+" / "<" / ">" / "/" / "."
|
||||
"?" / "_" / "=" / "+" / "/" / "."
|
||||
symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
|
||||
symunicode = <any code point greater than 127 whose Unicode
|
||||
category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
|
||||
|
@ -541,14 +534,14 @@ zero-length chunks.
|
|||
|
||||
Format B (known length):
|
||||
|
||||
[[ L(F_1...F_m) ]] = header(2,0,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]]
|
||||
[[ <L F_1...F_m> ]] = header(2,0,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]]
|
||||
|
||||
For `m` fields, `m+1` is supplied to `header`, to account for the
|
||||
encoding of the record label.
|
||||
|
||||
Format C (streaming):
|
||||
|
||||
[[ L(F_1...F_m) ]] = open(2,0) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close()
|
||||
[[ <L F_1...F_m> ]] = open(2,0) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close()
|
||||
|
||||
Applications *SHOULD* prefer the known-length format for encoding
|
||||
`Record`s.
|
||||
|
@ -569,7 +562,7 @@ be tersely encoded as
|
|||
number 4 to the symbol `void`, making
|
||||
|
||||
[[void]] = header(0,1,4) = [0x14]
|
||||
[[void()]] = header(2,0,1) ++ [[void]] = [0x81, 0x14]
|
||||
[[<void>]] = header(2,0,1) ++ [[void]] = [0x81, 0x14]
|
||||
|
||||
or it may map symbol `person` to placeholder number 102, making
|
||||
|
||||
|
@ -577,7 +570,7 @@ or it may map symbol `person` to placeholder number 102, making
|
|||
|
||||
and so
|
||||
|
||||
[[person("Dr", "Elizabeth", "Blackwell")]]
|
||||
[[<person "Dr" "Elizabeth" "Blackwell">]]
|
||||
= header(2,0,4) ++ [[person]] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
|
||||
= [0x84, 0x1F, 0x66] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
|
||||
|
||||
|
@ -714,7 +707,7 @@ The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
|
|||
To annotate a `Repr` `r` with some `Value` `v`, prepend `r` with
|
||||
`[0x05] ++ [[v]]`.
|
||||
|
||||
For example, the `Repr` corresponding to textual syntax `@a @b []`,
|
||||
For example, the `Repr` corresponding to textual syntax `@a@b[]`,
|
||||
i.e. an empty sequence annotated with two symbols, `a` and `b`, is
|
||||
|
||||
[[ @a @b [] ]]
|
||||
|
@ -734,8 +727,8 @@ placeholder number 0 to symbol `discard`, 1 to `capture`, and 2 to
|
|||
|
||||
| Value | Encoded byte sequence |
|
||||
|---------------------------------------------------|-------------------------------------------------------------------------------------|
|
||||
| `capture(discard())` | 82 11 81 10 |
|
||||
| `observe(speak(discard(), capture(discard())))` | 82 12 83 75 's' 'p' 'e' 'a' 'k' 81 10 82 11 81 11 |
|
||||
| `<capture <discard>>` | 82 11 81 10 |
|
||||
| `<observe <speak <discard> <capture <discard>>>>` | 82 12 83 75 's' 'p' 'e' 'a' 'k' 81 10 82 11 81 11 |
|
||||
| `[1 2 3 4]` (format B) | 94 31 32 33 34 |
|
||||
| `[1 2 3 4]` (format C) | 29 31 32 33 34 04 |
|
||||
| `[-2 -1 0 1]` | 94 3E 3F 30 31 |
|
||||
|
@ -754,7 +747,7 @@ placeholder number 0 to symbol `discard`, 1 to `capture`, and 2 to
|
|||
|
||||
The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`
|
||||
|
||||
[titled person 2 thing 1](101, "Blackwell", date(1821 2 3), "Dr")
|
||||
<[titled person 2 thing 1] 101 "Blackwell" <date 1821 2 3> "Dr">
|
||||
|
||||
encodes to
|
||||
|
||||
|
@ -982,16 +975,16 @@ such media types following the general rules for ordering of
|
|||
|
||||
| Value | Encoded hexadecimal byte sequence |
|
||||
|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
|
||||
| `mime(application/octet-stream #"abcde")` | 83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
|
||||
| `mime(text/plain #"ABC")` | 83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 |
|
||||
| `mime(application/xml #"<xhtml/>")` | 83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E |
|
||||
| `mime(text/csv #"123,234,345")` | 83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 |
|
||||
| `<mime application/octet-stream #"abcde">` | 83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
|
||||
| `<mime text/plain #"ABC">` | 83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 |
|
||||
| `<mime application/xml #"<xhtml/>">` | 83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E |
|
||||
| `<mime text/csv #"123,234,345">` | 83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 |
|
||||
|
||||
Applications making heavy use of `mime` records may choose to use a
|
||||
placeholder number for the symbol `mime` as well as the symbols for
|
||||
individual media types. For example, if placeholder number 1 were
|
||||
chosen for `mime`, and placeholder number 7 for `text/plain`, the
|
||||
second example above, `mime(text/plain #"ABC")`, would be encoded as
|
||||
second example above, `<mime text/plain #"ABC">`, would be encoded as
|
||||
`83 11 17 63 41 42 43`.
|
||||
|
||||
### Unicode normalization forms.
|
||||
|
@ -1023,9 +1016,9 @@ A family of labels `i`*n* and `u`*n* for *n* ∈ {8,16,32,64} denote
|
|||
*n*-bit-wide signed and unsigned range restrictions, respectively.
|
||||
Records with these labels *MUST* have one field, a `SignedInteger`,
|
||||
which *MUST* fall within the appropriate range. That is, to be valid,
|
||||
- in `i8(`*x*`)`, -128 <= *x* <= 127.
|
||||
- in `u8(`*x*`)`, 0 <= *x* <= 255.
|
||||
- in `i16(`*x*`)`, -32768 <= *x* <= 32767.
|
||||
- in `<i8 `*x*`>`, -128 <= *x* <= 127.
|
||||
- in `<u8 `*x*`>`, 0 <= *x* <= 255.
|
||||
- in `<i16 `*x*`>`, -32768 <= *x* <= 32767.
|
||||
- etc.
|
||||
|
||||
### Anonymous Tuples and Unit.
|
||||
|
@ -1033,15 +1026,15 @@ which *MUST* fall within the appropriate range. That is, to be valid,
|
|||
A `Tuple` is a `Record` with label `tuple` and zero or more fields,
|
||||
denoting an anonymous tuple of values.
|
||||
|
||||
The 0-ary tuple, `tuple()`, denotes the empty tuple, sometimes called
|
||||
The 0-ary tuple, `<tuple>`, denotes the empty tuple, sometimes called
|
||||
"unit" or "void" (but *not* e.g. JavaScript's "undefined" value).
|
||||
|
||||
### Null and Undefined.
|
||||
|
||||
Tony Hoare's
|
||||
"[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)"
|
||||
can be represented with the 0-ary `Record` `null()`. An "undefined"
|
||||
value can be represented as `undefined()`.
|
||||
can be represented with the 0-ary `Record` `<null>`. An "undefined"
|
||||
value can be represented as `<undefined>`.
|
||||
|
||||
### Dates and Times.
|
||||
|
||||
|
@ -1429,14 +1422,6 @@ byte count acting as a kind of exponent underneath the sign bit.
|
|||
- Canonicalization and early-bailout-equivalence-checking are in
|
||||
tension with support for streaming values.
|
||||
|
||||
Q. The postfix fields in the textual syntax come unannounced: "oh, and
|
||||
another thing, what you just read is a label, and here are some
|
||||
fields." This is a problem for interactive reading of textual syntax,
|
||||
because after a complete term, it needs to see the next character to
|
||||
tell whether it is an open-parenthesis or not! For this reason, I've
|
||||
disallowed whitespace between a label `Value` and the open-parenthesis
|
||||
of the fields. Is this reasonable??
|
||||
|
||||
Q. To remain compatible with JSON, portions of the text syntax have to
|
||||
remain case-insensitive (`%i"..."`). However, non-JSON extensions do
|
||||
not. There's only one (?) at the moment, the `%i"f"` in `Float`;
|
||||
|
|
Loading…
Reference in New Issue