From 0f5f0630d2a3be8bb5b276ac24a3ba6c31126890 Mon Sep 17 00:00:00 2001 From: Tony Garnock-Jones Date: Sun, 11 Aug 2019 23:54:57 +0100 Subject: [PATCH] Angle bracket S-exprs for Records! --- preserves.md | 63 ++++++++++++++++++++-------------------------------- 1 file changed, 24 insertions(+), 39 deletions(-) diff --git a/preserves.md b/preserves.md index 0bc6375..8502f28 100644 --- a/preserves.md +++ b/preserves.md @@ -6,7 +6,7 @@ # Preserves: an Expressive Data Language Tony Garnock-Jones -June 2019. Version 0.0.5. +August 2019. Version 0.0.6. [sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt [spki]: http://world.std.com/~cme/html/spki.html @@ -212,11 +212,10 @@ Any `Value` may be preceded by whitespace. Atom = Boolean / Float / Double / SignedInteger / String / ByteString / Symbol -Each `Record` is its label-`Value` followed by a parenthesised -grouping of its field-`Value`s. Whitespace is not permitted between -the label and the open-parenthesis. +Each `Record` is an angle-bracket enclosed grouping of its +label-`Value` followed by its field-`Value`s. - Record = Value "(" *Value ws ")" + Record = "<" Value *Value ws ">" `Sequence`s are enclosed in square brackets. `Dictionary` values are curly-brace-enclosed colon-separated pairs of values. `Set`s are @@ -236,12 +235,6 @@ or more values enclosed by the tokens `#set{` and commas separating, and commas terminating elements or key/value pairs within a collection. -The special cases of records with a single field, which is in turn a -sequence or dictionary, may be written omitting the parentheses. - - Record =/ Value Sequence - Record =/ Value Dictionary - `Boolean`s are the simple literal strings `#true` and `#false`. Boolean = %s"#true" / %s"#false" @@ -356,7 +349,7 @@ double quote mark. symstart = ALPHA / sympunct / symunicode symcont = ALPHA / sympunct / symunicode / DIGIT / "-" sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" / - "?" / "_" / "=" / "+" / "<" / ">" / "/" / "." + "?" / "_" / "=" / "+" / "/" / "." symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG) symunicode = ]] = header(2,0,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] For `m` fields, `m+1` is supplied to `header`, to account for the encoding of the record label. Format C (streaming): - [[ L(F_1...F_m) ]] = open(2,0) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close() + [[ ]] = open(2,0) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close() Applications *SHOULD* prefer the known-length format for encoding `Record`s. @@ -569,7 +562,7 @@ be tersely encoded as number 4 to the symbol `void`, making [[void]] = header(0,1,4) = [0x14] - [[void()]] = header(2,0,1) ++ [[void]] = [0x81, 0x14] + [[]] = header(2,0,1) ++ [[void]] = [0x81, 0x14] or it may map symbol `person` to placeholder number 102, making @@ -577,7 +570,7 @@ or it may map symbol `person` to placeholder number 102, making and so - [[person("Dr", "Elizabeth", "Blackwell")]] + [[]] = header(2,0,4) ++ [[person]] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]] = [0x84, 0x1F, 0x66] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]] @@ -714,7 +707,7 @@ The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and To annotate a `Repr` `r` with some `Value` `v`, prepend `r` with `[0x05] ++ [[v]]`. -For example, the `Repr` corresponding to textual syntax `@a @b []`, +For example, the `Repr` corresponding to textual syntax `@a@b[]`, i.e. an empty sequence annotated with two symbols, `a` and `b`, is [[ @a @b [] ]] @@ -734,8 +727,8 @@ placeholder number 0 to symbol `discard`, 1 to `capture`, and 2 to | Value | Encoded byte sequence | |---------------------------------------------------|-------------------------------------------------------------------------------------| -| `capture(discard())` | 82 11 81 10 | -| `observe(speak(discard(), capture(discard())))` | 82 12 83 75 's' 'p' 'e' 'a' 'k' 81 10 82 11 81 11 | +| `>` | 82 11 81 10 | +| ` >>>` | 82 12 83 75 's' 'p' 'e' 'a' 'k' 81 10 82 11 81 11 | | `[1 2 3 4]` (format B) | 94 31 32 33 34 | | `[1 2 3 4]` (format C) | 29 31 32 33 34 04 | | `[-2 -1 0 1]` | 94 3E 3F 30 31 | @@ -754,7 +747,7 @@ placeholder number 0 to symbol `discard`, 1 to `capture`, and 2 to The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record` - [titled person 2 thing 1](101, "Blackwell", date(1821 2 3), "Dr") + <[titled person 2 thing 1] 101 "Blackwell" "Dr"> encodes to @@ -982,16 +975,16 @@ such media types following the general rules for ordering of | Value | Encoded hexadecimal byte sequence | |--------------------------------------------|-------------------------------------------------------------------------------------------------------------------| -| `mime(application/octet-stream #"abcde")` | 83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 | -| `mime(text/plain #"ABC")` | 83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 | -| `mime(application/xml #"")` | 83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E | -| `mime(text/csv #"123,234,345")` | 83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 | +| `` | 83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 | +| `` | 83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 | +| `">` | 83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E | +| `` | 83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 | Applications making heavy use of `mime` records may choose to use a placeholder number for the symbol `mime` as well as the symbols for individual media types. For example, if placeholder number 1 were chosen for `mime`, and placeholder number 7 for `text/plain`, the -second example above, `mime(text/plain #"ABC")`, would be encoded as +second example above, ``, would be encoded as `83 11 17 63 41 42 43`. ### Unicode normalization forms. @@ -1023,9 +1016,9 @@ A family of labels `i`*n* and `u`*n* for *n* ∈ {8,16,32,64} denote *n*-bit-wide signed and unsigned range restrictions, respectively. Records with these labels *MUST* have one field, a `SignedInteger`, which *MUST* fall within the appropriate range. That is, to be valid, - - in `i8(`*x*`)`, -128 <= *x* <= 127. - - in `u8(`*x*`)`, 0 <= *x* <= 255. - - in `i16(`*x*`)`, -32768 <= *x* <= 32767. + - in ``, -128 <= *x* <= 127. + - in ``, 0 <= *x* <= 255. + - in ``, -32768 <= *x* <= 32767. - etc. ### Anonymous Tuples and Unit. @@ -1033,15 +1026,15 @@ which *MUST* fall within the appropriate range. That is, to be valid, A `Tuple` is a `Record` with label `tuple` and zero or more fields, denoting an anonymous tuple of values. -The 0-ary tuple, `tuple()`, denotes the empty tuple, sometimes called +The 0-ary tuple, ``, denotes the empty tuple, sometimes called "unit" or "void" (but *not* e.g. JavaScript's "undefined" value). ### Null and Undefined. Tony Hoare's "[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)" -can be represented with the 0-ary `Record` `null()`. An "undefined" -value can be represented as `undefined()`. +can be represented with the 0-ary `Record` ``. An "undefined" +value can be represented as ``. ### Dates and Times. @@ -1429,14 +1422,6 @@ byte count acting as a kind of exponent underneath the sign bit. - Canonicalization and early-bailout-equivalence-checking are in tension with support for streaming values. -Q. The postfix fields in the textual syntax come unannounced: "oh, and -another thing, what you just read is a label, and here are some -fields." This is a problem for interactive reading of textual syntax, -because after a complete term, it needs to see the next character to -tell whether it is an open-parenthesis or not! For this reason, I've -disallowed whitespace between a label `Value` and the open-parenthesis -of the fields. Is this reasonable?? - Q. To remain compatible with JSON, portions of the text syntax have to remain case-insensitive (`%i"..."`). However, non-JSON extensions do not. There's only one (?) at the moment, the `%i"f"` in `Float`;