Angle bracket S-exprs for Records!

2019-08-11 23:54:57 +01:00 · 2019-08-11 23:54:57 +01:00 · 0f5f0630d2
parent 74f9093c5e
commit 0f5f0630d2
1 changed files with 24 additions and 39 deletions
--- a/preserves.md
+++ b/preserves.md
@ -6,7 +6,7 @@
 # Preserves: an Expressive Data Language

 Tony Garnock-Jones <tonyg@leastfixedpoint.com>  
-June 2019. Version 0.0.5.
+August 2019. Version 0.0.6.

  [sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
  [spki]: http://world.std.com/~cme/html/spki.html
@ -212,11 +212,10 @@ Any `Value` may be preceded by whitespace.
              Atom = Boolean / Float / Double / SignedInteger /
                     String / ByteString / Symbol

-Each `Record` is its label-`Value` followed by a parenthesised
-grouping of its field-`Value`s. Whitespace is not permitted between
-the label and the open-parenthesis.
+Each `Record` is an angle-bracket enclosed grouping of its
+label-`Value` followed by its field-`Value`s.

-            Record = Value "(" *Value ws ")"
+            Record = "<" Value *Value ws ">"

 `Sequence`s are enclosed in square brackets. `Dictionary` values are
 curly-brace-enclosed colon-separated pairs of values. `Set`s are
@ -236,12 +235,6 @@ or more values enclosed by the tokens `#set{` and
    commas separating, and commas terminating elements or key/value
    pairs within a collection.

-The special cases of records with a single field, which is in turn a
-sequence or dictionary, may be written omitting the parentheses.
-
-           Record =/ Value Sequence
-           Record =/ Value Dictionary
-
 `Boolean`s are the simple literal strings `#true` and `#false`.

           Boolean = %s"#true" / %s"#false"
@ -356,7 +349,7 @@ double quote mark.
          symstart = ALPHA / sympunct / symunicode
           symcont = ALPHA / sympunct / symunicode / DIGIT / "-"
          sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
-                     "?" / "_" / "=" / "+" / "<" / ">" / "/" / "."
+                     "?" / "_" / "=" / "+" / "/" / "."
           symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
        symunicode = <any code point greater than 127 whose Unicode
                      category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
@ -541,14 +534,14 @@ zero-length chunks.

 Format B (known length):

-    [[ L(F_1...F_m) ]] = header(2,0,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]]
+    [[ <L F_1...F_m> ]] = header(2,0,m+1) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]]

 For `m` fields, `m+1` is supplied to `header`, to account for the
 encoding of the record label.

 Format C (streaming):

-    [[ L(F_1...F_m) ]] = open(2,0) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close()
+    [[ <L F_1...F_m> ]] = open(2,0) ++ [[L]] ++ [[F_1]] ++...++ [[F_m]] ++ close()

 Applications *SHOULD* prefer the known-length format for encoding
 `Record`s.
@ -569,7 +562,7 @@ be tersely encoded as
 number 4 to the symbol `void`, making

    [[void]] = header(0,1,4) = [0x14]
-    [[void()]] = header(2,0,1) ++ [[void]] = [0x81, 0x14]
+    [[<void>]] = header(2,0,1) ++ [[void]] = [0x81, 0x14]

 or it may map symbol `person` to placeholder number 102, making

@ -577,7 +570,7 @@ or it may map symbol `person` to placeholder number 102, making

 and so

-    [[person("Dr", "Elizabeth", "Blackwell")]]
+    [[<person "Dr" "Elizabeth" "Blackwell">]]
      = header(2,0,4) ++ [[person]] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]
      =          [0x84, 0x1F, 0x66] ++ [["Dr"]] ++ [["Elizabeth"]] ++ [["Blackwell"]]

@ -714,7 +707,7 @@ The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and
 To annotate a `Repr` `r` with some `Value` `v`, prepend `r` with
 `[0x05] ++ [[v]]`.

-For example, the `Repr` corresponding to textual syntax `@a @b []`,
+For example, the `Repr` corresponding to textual syntax `@a@b[]`,
 i.e. an empty sequence annotated with two symbols, `a` and `b`, is

    [[ @a @b [] ]]
@ -734,8 +727,8 @@ placeholder number 0 to symbol `discard`, 1 to `capture`, and 2 to

 | Value                                             | Encoded byte sequence                                                               |
 |---------------------------------------------------|-------------------------------------------------------------------------------------|
-| `capture(discard())`                              | 82 11 81 10                                                                         |
-| `observe(speak(discard(), capture(discard())))`   | 82 12 83 75 's' 'p' 'e' 'a' 'k' 81 10 82 11 81 11                                   |
+| `<capture <discard>>`                             | 82 11 81 10                                                                         |
+| `<observe <speak <discard> <capture <discard>>>>` | 82 12 83 75 's' 'p' 'e' 'a' 'k' 81 10 82 11 81 11                                   |
 | `[1 2 3 4]` (format B)                            | 94 31 32 33 34                                                                      |
 | `[1 2 3 4]` (format C)                            | 29 31 32 33 34 04                                                                   |
 | `[-2 -1 0 1]`                                     | 94 3E 3F 30 31                                                                      |
@ -754,7 +747,7 @@ placeholder number 0 to symbol `discard`, 1 to `capture`, and 2 to

 The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`

-    [titled person 2 thing 1](101, "Blackwell", date(1821 2 3), "Dr")
+    <[titled person 2 thing 1] 101 "Blackwell" <date 1821 2 3> "Dr">

 encodes to

@ -982,16 +975,16 @@ such media types following the general rules for ordering of

 | Value                                      | Encoded hexadecimal byte sequence                                                                                 |
 |--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
-| `mime(application/octet-stream #"abcde")`  | 83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
-| `mime(text/plain #"ABC")`                  | 83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43                                                    |
-| `mime(application/xml #"<xhtml/>")`        | 83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E                   |
-| `mime(text/csv #"123,234,345")`            | 83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35                                  |
+| `<mime application/octet-stream #"abcde">` | 83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
+| `<mime text/plain #"ABC">`                 | 83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43                                                    |
+| `<mime application/xml #"<xhtml/>">`       | 83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E                   |
+| `<mime text/csv #"123,234,345">`           | 83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35                                  |

 Applications making heavy use of `mime` records may choose to use a
 placeholder number for the symbol `mime` as well as the symbols for
 individual media types. For example, if placeholder number 1 were
 chosen for `mime`, and placeholder number 7 for `text/plain`, the
-second example above, `mime(text/plain #"ABC")`, would be encoded as
+second example above, `<mime text/plain #"ABC">`, would be encoded as
 `83 11 17 63 41 42 43`.

 ### Unicode normalization forms.
@ -1023,9 +1016,9 @@ A family of labels `i`*n* and `u`*n* for *n* ∈ {8,16,32,64} denote
 *n*-bit-wide signed and unsigned range restrictions, respectively.
 Records with these labels *MUST* have one field, a `SignedInteger`,
 which *MUST* fall within the appropriate range. That is, to be valid,
- - in `i8(`*x*`)`, -128 <= *x* <= 127.
- - in `u8(`*x*`)`, 0 <= *x* <= 255.
- - in `i16(`*x*`)`, -32768 <= *x* <= 32767.
+ - in `<i8 `*x*`>`, -128 <= *x* <= 127.
+ - in `<u8 `*x*`>`, 0 <= *x* <= 255.
+ - in `<i16 `*x*`>`, -32768 <= *x* <= 32767.
 - etc.

 ### Anonymous Tuples and Unit.
@ -1033,15 +1026,15 @@ which *MUST* fall within the appropriate range. That is, to be valid,
 A `Tuple` is a `Record` with label `tuple` and zero or more fields,
 denoting an anonymous tuple of values.

-The 0-ary tuple, `tuple()`, denotes the empty tuple, sometimes called
+The 0-ary tuple, `<tuple>`, denotes the empty tuple, sometimes called
 "unit" or "void" (but *not* e.g. JavaScript's "undefined" value).

 ### Null and Undefined.

 Tony Hoare's
 "[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)"
-can be represented with the 0-ary `Record` `null()`. An "undefined"
-value can be represented as `undefined()`.
+can be represented with the 0-ary `Record` `<null>`. An "undefined"
+value can be represented as `<undefined>`.

 ### Dates and Times.

@ -1429,14 +1422,6 @@ byte count acting as a kind of exponent underneath the sign bit.
 - Canonicalization and early-bailout-equivalence-checking are in
   tension with support for streaming values.

-Q. The postfix fields in the textual syntax come unannounced: "oh, and
-another thing, what you just read is a label, and here are some
-fields." This is a problem for interactive reading of textual syntax,
-because after a complete term, it needs to see the next character to
-tell whether it is an open-parenthesis or not! For this reason, I've
-disallowed whitespace between a label `Value` and the open-parenthesis
-of the fields. Is this reasonable??
-
 Q. To remain compatible with JSON, portions of the text syntax have to
 remain case-insensitive (`%i"..."`). However, non-JSON extensions do
 not. There's only one (?) at the moment, the `%i"f"` in `Float`;