Restrict usage of commas. Bump spec version to 0.992

2023-11-01 14:13:36 +01:00 · 2023-11-01 14:13:36 +01:00 · c89147dd6a
parent e0736e03c5
commit c89147dd6a
8 changed files with 82 additions and 87 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,4 +1,5 @@
 _site/
+cheatsheet.pdf
 preserves-expressions.pdf
 preserves-binary.pdf
 preserves-schema.pdf
--- a/3
+++ b/3
@ -5,7 +5,8 @@ PDFS=\
 	preserves-text.pdf \
 	preserves-binary.pdf \
 	preserves-schema.pdf \
-	preserves-expressions.pdf
+	preserves-expressions.pdf \
+	cheatsheet.pdf

 all: $(PDFS)

--- a/_config.yml
+++ b/_config.yml
@ -14,4 +14,4 @@ defaults:

 title: "Preserves"
 version_date: "October 2023"
-version: "0.991.0"
+version: "0.992.0"
--- a/_includes/cheatsheet-text-plaintext.md
+++ b/_includes/cheatsheet-text-plaintext.md
@ -3,14 +3,15 @@ Document      :=  Value ws
 Value         :=  ws (Record | Collection | Atom | Embedded | Annotated)
 Collection    :=  Sequence | Dictionary | Set
 Atom          :=  Boolean | ByteString | String | QuotedSymbol | Symbol | Number
-ws            :=  (space | tab | cr | lf | `,`)*
+ws            :=  (space | tab | cr | lf)*
+commas        :=  (ws `,`)* ws
 delimiter     :=  ws | `<` | `>` | `[` | `]` | `{` | `}`
-                     | `#` | `:` | `"` | `|` | `@` | `;`
+                     | `#` | `:` | `"` | `|` | `@` | `;` | `,`

 Record        :=  `<` Value+ ws `>`
-Sequence      :=  `[` Value* ws `]`
-Dictionary    :=  `{` (Value ws `:` Value)* ws `}`
-Set           :=  `#{` Value* ws `}`
+Sequence      :=  `[` (commas Value)* commas `]`
+Set           :=  `#{` (commas Value)* commas `}`
+Dictionary    :=  `{` (commas Value ws `:` Value)* commas `}`

 Boolean       :=  `#t` | `#f`
 ByteString    :=  `#"` binchar* `"`
--- a/_includes/cheatsheet-text.md
+++ b/_includes/cheatsheet-text.md
@ -3,14 +3,15 @@
 | *Value* | := | **ws** (*Record* &#124; *Collection* &#124; *Atom* &#124; *Embedded* &#124; *Annotated*) |
 | *Collection* | := | *Sequence* &#124; *Dictionary* &#124; *Set* |
 | *Atom* | := | *Boolean* &#124; *ByteString* &#124; *String* &#124; *QuotedSymbol* &#124; *Symbol* &#124; *Number* |
-| **ws** | := | (**space** &#124; **tab** &#124; **cr** &#124; **lf** &#124;`,`)<sup>⋆</sup> |
-| **delimiter** | := | **ws** &#124; `<` &#124; `>` &#124; `[` &#124; `]` &#124; `{` &#124; `}` &#124; `#` &#124; `:` &#124; `"` &#124; `|` &#124; `@` &#124; `;` |
+| **ws** | := | (**space** &#124; **tab** &#124; **cr** &#124; **lf**)<sup>⋆</sup> |
+| **commas** | := | (**ws** `,`)<sup>⋆</sup> **ws** |
+| **delimiter** | := | **ws** &#124; `<` &#124; `>` &#124; `[` &#124; `]` &#124; `{` &#124; `}` &#124; `#` &#124; `:` &#124; `"` &#124; `|` &#124; `@` &#124; `;` &#124; `,` |

 {:.postcard-grammar.textsyntax}
 | *Record* | := | `<`*Value*<sup>+</sup> **ws**`>` |
-| *Sequence* | := | `[`*Value*<sup>⋆</sup> **ws**`]` |
-| *Dictionary* | := | `{` (*Value* **ws**`:`*Value*)<sup>⋆</sup> **ws**`}` |
-| *Set* | := | `#{`*Value*<sup>⋆</sup> **ws**`}` |
+| *Sequence* | := | `[`(**commas** *Value*)<sup>⋆</sup> **commas**`]` |
+| *Set* | := | `#{`(**commas** *Value*)<sup>⋆</sup> **commas**`}` |
+| *Dictionary* | := | `{` (**commas** *Value* **ws**`:`*Value*)<sup>⋆</sup> **commas**`}` |

 {:.postcard-grammar.textsyntax}
 | *Boolean* | := | `#t`&#124;`#f` |
--- a/implementations/rust/preserves/doc/cheatsheet-text-plaintext.md
+++ b/implementations/rust/preserves/doc/cheatsheet-text-plaintext.md
@ -3,14 +3,15 @@ Document      :=  Value ws
 Value         :=  ws (Record | Collection | Atom | Embedded | Annotated)
 Collection    :=  Sequence | Dictionary | Set
 Atom          :=  Boolean | ByteString | String | QuotedSymbol | Symbol | Number
-ws            :=  (space | tab | cr | lf | `,`)*
+ws            :=  (space | tab | cr | lf)*
+commas        :=  (ws `,`)* ws
 delimiter     :=  ws | `<` | `>` | `[` | `]` | `{` | `}`
-                     | `#` | `:` | `"` | `|` | `@` | `;`
+                     | `#` | `:` | `"` | `|` | `@` | `;` | `,`

 Record        :=  `<` Value+ ws `>`
-Sequence      :=  `[` Value* ws `]`
-Dictionary    :=  `{` (Value ws `:` Value)* ws `}`
-Set           :=  `#{` Value* ws `}`
+Sequence      :=  `[` (commas Value)* commas `]`
+Set           :=  `#{` (commas Value)* commas `}`
+Dictionary    :=  `{` (commas Value ws `:` Value)* commas `}`

 Boolean       :=  `#t` | `#f`
 ByteString    :=  `#"` binchar* `"`
--- a/preserves-expressions.md
+++ b/preserves-expressions.md
@ -25,16 +25,11 @@ which (ab)use Preserves text syntax as a kind of programming notation.
 The P-expression grammar includes by reference the definition of `Atom` from the
 [text syntax][], as well as the definitions that `Atom` depends on.

-P-expressions take their own approach to inter-token whitespace,
-however.
-
 <a id="whitespace">
-**Whitespace.** Whitespace `sp` is defined as any number of spaces,
-tabs, carriage returns, or line feeds. Commas are *not* considered
-whitespace in P-expressions, and so class `sp` is different to class
-`ws` from the text syntax.
+**Whitespace.** Whitespace `ws` is, as in the text syntax, defined as
+any number of spaces, tabs, carriage returns, or line feeds.

-                sp = *(%x20 / %x09 / CR / LF)
+                ws = *(%x20 / %x09 / CR / LF)

 No changes to [the Preserves semantic model](preserves.html) are made.
 Every Preserves text-syntax term can be parsed as a valid P-expression,
@ -47,20 +42,22 @@ below](#reading-preserves)).
 Standalone documents containing P-expressions are sequences of
 individual `Expr`s, followed by trailing whitespace.

-          Document = *Expr sp
+          Document = *Expr ws

 A single P-expression `Expr` can be an `Atom` from the [text syntax][],
 a compound expression, special punctuation, an `Embedded` expression, or
-an `Annotated` expression.
+an `Annotated` expression. The class `SimpleExpr` includes all of `Expr`
+except special punctuation.

-              Expr = sp (Atom | Compound | Punct | Embedded | Annotated)
+              Expr = ws (SimpleExpr | Punct)
+        SimpleExpr = Atom | Compound | Embedded | Annotated

 Embedded and annotated values are as in the text syntax, differing only
-in that uses of `Value` are replaced with `Expr`.
+in that uses of `Value` are replaced with `SimpleExpr`.

-           Embedded = "#!" Expr
-          Annotated = Annotation Expr
-         Annotation = "@" Expr / "#" [(%x20 / %x09) linecomment] (CR / LF)
+           Embedded = "#!" SimpleExpr
+          Annotated = Annotation SimpleExpr
+         Annotation = "@" SimpleExpr / "#" [(%x20 / %x09) linecomment] (CR / LF)

 P-expression special punctuation marks are comma, semicolon, and sequences of one or more colons.

@ -70,11 +67,11 @@ Compound expressions are sequences of `Expr`s with optional trailing
 `Annotation`s, surrounded by various kinds of parentheses.

          Compound = Sequence / Record / Block / Group / Set
-          Sequence =  "[" *Expr Trailer sp "]"
-            Record =  "<" *Expr Trailer sp ">"
-             Block =  "{" *Expr Trailer sp "}"
-             Group =  "(" *Expr Trailer sp ")"
-               Set = "#{" *Expr Trailer sp "}"
+          Sequence =  "[" *Expr Trailer ws "]"
+            Record =  "<" *Expr Trailer ws ">"
+             Block =  "{" *Expr Trailer ws "}"
+             Group =  "(" *Expr Trailer ws ")"
+               Set = "#{" *Expr Trailer ws "}"

 In an `Annotated` P-expression, annotations and comments attach to the
 term following them, just as in the ordinary text syntax. However, it is
@ -117,9 +114,8 @@ sequences of Preserves values.
 The [previous section](#encoding-pexprs) discussed ways of representing
 P-expressions using Preserves. Here, we discuss *interpreting*
 P-expressions *as* Preserves so that (1) a Preserves datum (2) written
-using Preserves text syntax[^careful-use-of-commas] and then (3) read as
-a P-expression can be (4) interpreted from that P-expression to yield
-the original datum.
+using Preserves text syntax and then (3) read as a P-expression can be
+(4) interpreted from that P-expression to yield the original datum.

 1. Every `(`..`)` or `;` that appears is an error.
 2. Every `:`, `::`, `:::`, ... is an error, except in context of `Block`s as described below.
@ -127,17 +123,13 @@ the original datum.
 4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error]
 5. Every `Record` with no values in it is an error.
 6. Every `Block` must contain zero or more repeating triplets of
-    `Expr`, `:`, `Expr`. Any `Block` not following this pattern is an
-    error. Each `Block` following the pattern is translated to a
-    `Dictionary` containing a key/value pair for each triplet. Any
-    `Block` with duplicate keys (under interpretation) is an error.
- 7. Every `Set` containing any duplicate expressions (under interpretation) is an error.
-
-[^careful-use-of-commas]: Every Preserves datum can be read via a
-    P-expression reader and then interpreted successfully as Preserves
-    *if commas are omitted entirely in the text*. If commas are present,
-    however, they must not appear in certain positions, namely: either
-    before or after *p* in `@`*p* *q*; or before *p* in `#!`*p*.
+    `SimpleExpr`, `:`, `SimpleExpr`. Any `Block` not following this
+    pattern is an error. Each `Block` following the pattern is
+    translated to a `Dictionary` containing a key/value pair for each
+    triplet. Any `Block` with duplicate keys (under interpretation) is
+    an error.
+ 7. Every `Set` containing any duplicate expressions (under
+    interpretation) is an error.

 [^discard-trailers-instead-of-error]: **Implementation note.** When
    implementing parsing of P-expressions into Preserves, consider
@ -283,35 +275,31 @@ P-expression `Expr`s.

 ## <a id="reading-preserves-equationally"></a>Appendix: Equations for interpreting P-expressions as Preserves

-The partial function **uncomma**(*p*) removes all occurrences of `,`
-from a P-expression *p*.
+The function **uncomma**(*p*) removes all occurrences of `,` from a
+P-expression *p* ∈ `Expr` − {`,`}.

 {:.pseudocode.equations}
-| **uncomma** : **Expr**      | ⇀ | **Expr**                             |                                       |
-| **uncomma**(`[`*p* ...`]`)  | = | `[`**uncomma**(*p*) ...`]`           | omitting any *p* = `,`                |
-| **uncomma**(`<`*p* ...`>`)  | = | `<`**uncomma**(*p*) ...`>`           | omitting any *p* = `,`                |
-| **uncomma**(`{`*p* ...`}`)  | = | `{`**uncomma**(*p*) ...`}`           | omitting any *p* = `,`                |
-| **uncomma**(`(`*p* ...`)`)  | = | `(`**uncomma**(*p*) ...`)`           | omitting any *p* = `,`                |
-| **uncomma**(`#{`*p* ...`}`) | = | `#{`**uncomma**(*p*) ...`}`          | omitting any *p* = `,`                |
-| **uncomma**(`#!`*p*)        | = | `#!`**uncomma**(*p*) ...`}`          |                                       |
-| **uncomma**(`@`*p* *q*)     | = | `@`**uncomma**(*p*) **uncomma**(*q*) |                                       |
-| **uncomma**(*p*)            | = | *p*                                  | if *p* ∈ **Atom** ∪ **Punct** - {`,`} |
-
-{:.pseudocode.equations}
-| **uncomma** : **Document** | ⇀ | **Document**         |
-| **uncomma**(*p* ...)       | = | **uncomma**(*p*) ... |
+| **uncomma** : **Expr** − {`,`} | ⟶ | **Expr**                             |                                       |
+| **uncomma**(`[`*p* ...`]`)     | = | `[`**uncomma**(*p*) ...`]`           | omitting any *p* = `,`                |
+| **uncomma**(`<`*p* ...`>`)     | = | `<`**uncomma**(*p*) ...`>`           | omitting any *p* = `,`                |
+| **uncomma**(`{`*p* ...`}`)     | = | `{`**uncomma**(*p*) ...`}`           | omitting any *p* = `,`                |
+| **uncomma**(`(`*p* ...`)`)     | = | `(`**uncomma**(*p*) ...`)`           | omitting any *p* = `,`                |
+| **uncomma**(`#{`*p* ...`}`)    | = | `#{`**uncomma**(*p*) ...`}`          | omitting any *p* = `,`                |
+| **uncomma**(`#!`*p*)           | = | `#!`**uncomma**(*p*)                 |                                       |
+| **uncomma**(`@`*p* *q*)        | = | `@`**uncomma**(*p*) **uncomma**(*q*) |                                       |
+| **uncomma**(*p*)               | = | *p*                                  | if *p* ∈ **Atom** ∪ **Punct** − {`,`} |

 We write ⌞**uncomma**(*p*)⌟ for the partial function mapping a
-P-expression *p* ∈ `Expr` to a corresponding Preserves `Value`.
+P-expression *p* ∈ `Expr` − {`,`} to a corresponding Preserves `Value`.

 {:.pseudocode.equations}
-| ⌞·⌟ : **Expr**        | ⇀ | **Value**               |                               |
-| ⌞`[`*p* ...`]`⌟       | = | `[`⌞*p*⌟ ...`]`         |                               |
-| ⌞`<`ℓ *p* ...`>`⌟     | = | `<`⌞ℓ⌟ ⌞*p*⌟ ...`>`     |                               |
-| ⌞`{`*k*`:`*v* ...`}`⌟ | = | `{`⌞*k*⌟`:`⌞*v*⌟ ...`}` | if all ⌞*k*⌟ ... are distinct |
-| ⌞`#{`*p* ...`}`⌟      | = | `#{`⌞*p*⌟ ...`}`        | if all ⌞*p*⌟ ... are distinct |
-| ⌞`#!`*p*⌟             | = | `#!`⌞*p*⌟               |                               |
-| ⌞`@`*p* *q*⌟          | = | `@`⌞*p*⌟ ⌞*q*⌟          |                               |
-| ⌞*p*⌟                 | = | *p*                     | when *p* ∈ **Atom**           |
+| ⌞·⌟ : **Expr** − {`,`} | ⇀ | **Value**               |                               |
+| ⌞`[`*p* ...`]`⌟        | = | `[`⌞*p*⌟ ...`]`         |                               |
+| ⌞`<`ℓ *p* ...`>`⌟      | = | `<`⌞ℓ⌟ ⌞*p*⌟ ...`>`     |                               |
+| ⌞`{`*k*`:`*v* ...`}`⌟  | = | `{`⌞*k*⌟`:`⌞*v*⌟ ...`}` | if all ⌞*k*⌟ ... are distinct |
+| ⌞`#{`*p* ...`}`⌟       | = | `#{`⌞*p*⌟ ...`}`        | if all ⌞*p*⌟ ... are distinct |
+| ⌞`#!`*p*⌟              | = | `#!`⌞*p*⌟               |                               |
+| ⌞`@`*p* *q*⌟           | = | `@`⌞*p*⌟ ⌞*q*⌟          |                               |
+| ⌞*p*⌟                  | = | *p*                     | when *p* ∈ **Atom**           |

 ## Notes
--- a/preserves-text.md
+++ b/preserves-text.md
@ -28,10 +28,15 @@ a grammar for recognising sequences of Unicode scalar values.
 UTF-8 where possible.

 <a id="whitespace"></a>
-**Whitespace.** Whitespace is defined as any number of spaces, tabs,
-carriage returns, line feeds, or commas.
+**Whitespace.** Whitespace `ws` is defined as any number of spaces, tabs,
+carriage returns, or line feeds.

-                ws = *(%x20 / %x09 / CR / LF / ",")
+                ws = *(%x20 / %x09 / CR / LF)
+
+<a id="commas"></a>
+**Commas.** In some positions inside compound terms, commas are permitted and ignored.
+
+            commas = *(ws ",") ws

 <a id="delimiters"></a>
 **Delimiters.** Some tokens (`Boolean`, `SymbolOrNumber`) *MUST* be
@ -39,7 +44,7 @@ followed by a `delimiter` or by the end of the input.[^delimiters-lookahead]

         delimiter = ws
                   / "<" / ">" / "[" / "]" / "{" / "}"
-                   / "#" / ":" / DQUOTE / "|" / "@" / ";"
+                   / "#" / ":" / DQUOTE / "|" / "@" / ";" / ","

 [^delimiters-lookahead]: The addition of this constraint means that
    implementations must now use some kind of lookahead to make sure a
@ -73,9 +78,9 @@ printing sets and dictionaries, implementations *SHOULD* order elements
 resp. keys with respect to the [total order over
 `Value`s](preserves.html#total-order).[^rationale-print-ordering]

-          Sequence =  "["  *Value               ws "]"
-               Set = "#{"  *Value               ws "}"
-        Dictionary =  "{" *(Value ws ":" Value) ws "}"
+          Sequence =  "[" *(commas Value)              commas "]"
+               Set = "#{" *(commas Value)              commas "}"
+        Dictionary =  "{" *(commas Value ws ":" Value) commas "}"

  [^printing-collections]: **Implementation note.** When implementing
    printing of `Value`s using the textual syntax, consider supporting
@ -147,8 +152,8 @@ following the usual rules for double quote and backslash.
                   / %s"\x" 2HEXDIG
                   / "\" DQUOTE

-The second is a sequence of pairs of hexadecimal digits interleaved
-with whitespace and surrounded by `#x"` and `"`.
+The second is pairs of hexadecimal digits interleaved with whitespace
+and surrounded by `#x"` and `"`.

       ByteString =/ %s"#x" DQUOTE *(ws 2HEXDIG) ws DQUOTE

@ -320,9 +325,6 @@ give an illusion of progress.

 ## Acknowledgements

-The treatment of commas as whitespace in the text syntax is inspired
-by the same feature of [EDN](https://github.com/edn-format/edn).
-
 The text syntax for `Boolean`s, `Symbol`s, and `ByteString`s is
 directly inspired by [Racket](https://racket-lang.org/)'s lexical
 syntax.