This is way better

2023-11-01 13:06:15 +01:00 · 2023-11-01 13:06:15 +01:00 · d7b983e140
parent 1a0772d39f
commit d7b983e140
2 changed files with 145 additions and 108 deletions
--- a/_includes/cheatsheet-pexprs-plaintext.md
+++ b/_includes/cheatsheet-pexprs-plaintext.md
@ -0,0 +1,22 @@
+The definition of `Atom` is as given in the Preserves text syntax.
+
+```text
+Document      :=  Expr* sp
+Expr          :=  sp (Atom | Compound | Punct | Embedded | Annotated)
+Compound      :=  Sequence | Record | Block | Group | Set
+Punct         :=  `,` | `;` | `:`+
+
+sp            :=  (space | tab | cr | lf)*
+
+Sequence      :=   `[` Expr* Trailer sp `]`
+Record        :=   `<` Expr* Trailer sp `>`
+Block         :=   `{` Expr* Trailer sp `}`
+Group         :=   `(` Expr* Trailer sp `)`
+Set           :=  `#{` Expr* Trailer sp `}`
+
+Trailer       :=  Annotation*
+
+Embedded      :=  `#!` Expr
+Annotated     :=  Annotation Expr
+Annotation    :=  `@` Expr | `#` ((space | tab) linecomment) (cr | lf)
+```
--- a/preserves-expressions.md
+++ b/preserves-expressions.md
@ -3,128 +3,105 @@ title: "P-expressions"
 ---

 Tony Garnock-Jones <tonyg@leastfixedpoint.com>  
-October 2023. Version 0.2.0.
+October 2023. Version 0.3.0.
+
+[text syntax]: preserves-text.html

 This document defines a grammar called *Preserves Expressions*
 (*P-expressions*, *pexprs*) that includes [ordinary Preserves text
-syntax](preserves-text.html) but offers extensions sufficient to support
-a Lisp- or Haskell-like programming notation.
+syntax][text syntax] but offers extensions sufficient to support a Lisp-
+or Haskell-like programming notation.

-**Motivation.** The [text syntax](preserves-text.html) for Preserves
-works well for writing `Value`s, i.e. data. However, in some contexts,
-Preserves applications need a broader grammar that allows interleaving
-of *expressions* with data. Two examples are the [Preserves Schema
+**Motivation.** The [text syntax][] for Preserves works well for writing
+`Value`s, i.e. data. However, in some contexts, Preserves applications
+need a broader grammar that allows interleaving of *expressions* with
+data. Two examples are the [Preserves Schema
 language](preserves-schema.html) and the [Synit configuration scripting
 language](https://synit.org/book/operation/scripting.html), both of
 which (ab)use Preserves text syntax as a kind of programming notation.

 ## Preliminaries

-The P-expression grammar takes the text syntax grammar as its base and
-modifies it.
+The P-expression grammar includes by reference the definition of `Atom` from the
+[text syntax][], as well as the definitions that `Atom` depends on.
+
+P-expressions take their own approach to inter-token whitespace,
+however.

 <a id="whitespace">
-**Whitespace.** Whitespace is redefined as any number of spaces, tabs,
-carriage returns, or line feeds. Commas are *not* considered whitespace
-in P-expressions.
+**Whitespace.** Whitespace `sp` is defined as any number of spaces,
+tabs, carriage returns, or line feeds. Commas are *not* considered
+whitespace in P-expressions, and so class `sp` is different to class
+`ws` from the text syntax.

-                ws = *(%x20 / %x09 / CR / LF)
+                sp = *(%x20 / %x09 / CR / LF)

-<a id="delimiters"></a>
-**Delimiters.** Because commas are no longer included in class `ws`,
-class `delimiter` is widened to include them explicitly.
-
-         delimiter = ws / ","
-                   / "<" / ">" / "[" / "]" / "{" / "}"
-                   / "#" / ":" / DQUOTE / "|" / "@" / ";"
+No changes to [the Preserves semantic model](preserves.html) are made.
+Every Preserves text-syntax term can be parsed as a valid P-expression,
+but in general P-expressions must be rewritten or otherwise interpreted
+before a meaningful Preserves value can be arrived at ([see
+below](#reading-preserves)).

 ## Grammar

-P-expressions add comma, semicolon, and sequences of one or more colons
-to the syntax class `Value`.
+Standalone documents containing P-expressions are sequences of
+individual `Expr`s, followed by trailing whitespace.

-            Value =/ Comma / Semicolon / Colons
-             Comma = ","
-         Semicolon = ";"
-            Colons = 1*":"
+          Document = *Expr sp

-Now that colon is in `Value`, the syntax for `Dictionary` is replaced
-with `Block` everywhere it is mentioned.
+A single P-expression `Expr` can be an `Atom` from the [text syntax][],
+a compound expression, special punctuation, an `Embedded` expression, or
+an `Annotated` expression.

-             Block = "{" *Value ws "}"
+              Expr = sp (Atom | Compound | Punct | Embedded | Annotated)

-Syntax for `Record` is loosened to allow empty angle brackets.
+Embedded and annotated values are as in the text syntax, differing only
+in that uses of `Value` are replaced with `Expr`.

-            Record = "<" *Value ws ">"
+           Embedded = "#!" Expr
+          Annotated = Annotation Expr
+         Annotation = "@" Expr / "#" [(%x20 / %x09) linecomment] (CR / LF)

-New syntax for explicit uninterpreted grouping of sequences of values is
-introduced, and added to class `Value`.
+P-expression special punctuation marks are comma, semicolon, and sequences of one or more colons.

-            Value =/ ws Group
-             Group = "(" *Value ws ")"
+             Punct = "," / ";" / 1*":"

-Finally, class `Document` is replaced in order to allow standalone
-documents to directly comprise a sequence of multiple values.
+Compound expressions are sequences of `Expr`s with optional trailing
+`Annotation`s, surrounded by various kinds of parentheses.

-          Document = *Value ws
+          Compound = Sequence / Record / Block / Group / Set
+          Sequence =  "[" *Expr Trailer sp "]"
+            Record =  "<" *Expr Trailer sp ">"
+             Block =  "{" *Expr Trailer sp "}"
+             Group =  "(" *Expr Trailer sp ")"
+               Set = "#{" *Expr Trailer sp "}"

-No changes to [the Preserves semantic model](preserves.html) are made.
-Every Preserves text-syntax term is a valid P-expression, but in general
-P-expressions must be rewritten or otherwise interpreted before a
-meaningful Preserves value can be arrived at ([see
-below](#reading-preserves)).
+In an `Annotated` P-expression, annotations and comments attach to the
+term following them, just as in the ordinary text syntax. However, it is
+common in programming notations to allow comments at the end of a file
+or other sequential construct. The ordinary text syntax forbids comments
+in these positions, but P-expressions allow them.

-## <a id="annotations"></a>Annotations and Comments
-
-Annotations and comments attach to the term following them, just as in
-the ordinary text syntax. However, it is common in programming notations
-to allow comments at the end of a file or other sequential construct:
-
-    {
-        key: value
-        # example of a comment at the end of a dictionary
-    }
-    # example of a comment at the end of the input file
-
-While the ordinary text syntax forbids comments in these positions,
-P-expressions allow them:
-
-         Document =/ *Value Trailer ws
-           Record =/  "<" *Value Trailer ws ">"
-         Sequence =/  "[" *Value Trailer ws "]"
-              Set =/ "#{" *Value Trailer ws "}"
-            Block =/  "{" *Value Trailer ws "}"
-            Group =/  "(" *Value Trailer ws ")"
-
-           Trailer = 1*Annotation
+           Trailer = *Annotation

 ## <a id="encoding-pexprs"></a>Encoding P-expressions as Preserves

 We write ⌜*p*⌝ for the encoding into Preserves of P-expression *p*.

 {:.pseudocode.equations}
-| ⌜·⌝ : **P-expression** | ⟶ | **Preserves** |
-
-Aside from `Group`, `Block`, `Comma`, `Semicolon`, `Colons`, `Trailer`,
-and `Record`, P-expressions are encoded directly as Preserves data.
-
-{:.pseudocode.equations}
-| ⌜`[`*p* ...`]`⌝  | = | `[`⌜*p*⌝ ...`]`             |
-| ⌜`#{`*p* ...`}`⌝ | = | `#{`⌜*p*⌝ ...`}`            |
-| ⌜`#!`*p*⌝        | = | `#!`⌜*p*⌝                   |
-| ⌜`@`*p* *q*⌝     | = | `@`⌜*p*⌝ ⌜*q*⌝              |
-| ⌜*p*⌝            | = | *p* when *p* ∈ **Atom** |
-
-Everything else is encoded as Preserves records.
-
-{:.pseudocode.equations}
-| ⌜`<`*p* ...`>`⌝ | = | `<r` ⌜*p*⌝ ...`>` |
-| ⌜`(`*p* ...`)`⌝ | = | `<g` ⌜*p*⌝ ...`>` |
-| ⌜`{`*p* ...`}`⌝ | = | `<b` ⌜*p*⌝ ...`>` |
-| ⌜`,`⌝           | = | `<s |,|>`         |
-| ⌜`;`⌝           | = | `<s |;|>`         |
-| ⌜`:` ...⌝       | = | `<s |:` ...`|>`   |
-| ⌜*t*⌝           | = | ⌜*a*⌝ ... `<a>`, where *a* ... are the annotations in *t* and *t* ∈ **Trailer** |
+| ⌜·⌝ : **Expr**   | ⟶ | **Value** |
+| ⌜`[`*p* ...`]`⌝  | = | `[`⌜*p*⌝ ...`]`         |
+| ⌜`<`*p* ...`>`⌝  | = | `<r` ⌜*p*⌝ ...`>`       |
+| ⌜`{`*p* ...`}`⌝  | = | `<b` ⌜*p*⌝ ...`>`       |
+| ⌜`(`*p* ...`)`⌝  | = | `<g` ⌜*p*⌝ ...`>`       |
+| ⌜`#{`*p* ...`}`⌝ | = | `<s `⌜*p*⌝ ...`>`       |
+| ⌜`#!`*p*⌝        | = | `#!`⌜*p*⌝               |
+| ⌜`@`*p* *q*⌝     | = | `@`⌜*p*⌝ ⌜*q*⌝          |
+| ⌜*p*⌝            | = | *p* | when *p* ∈ **Atom** |
+| ⌜`,`⌝            | = | `<p |,|>`               |
+| ⌜`;`⌝            | = | `<p |;|>`               |
+| ⌜`:` ...⌝        | = | `<p |:` ...`|>`         |
+| ⌜*t*⌝            | = | ⌜*a*⌝ ... `<a>` | where *a* ... are the annotations in *t* and *t* ∈ **Trailer** |

 The record `<a>` acts as an anchor for the annotations in a `Trailer`.

@ -145,18 +122,19 @@ using Preserves text syntax and then (3) read as a P-expression can be

 A reader for P-expressions can be adapted to yield a reader for
 Preserves terms by processing (subterms of) each P-expression that the
-reader produces. The only subterms that need processing are the special
-classes mentioned above.
+reader produces.

- 1. Every `Group` or `Semicolon` that appears is an error.
- 2. Every `Colons` with two or more colons in it is an error.
- 3. Every `Comma` that appears is discarded.
+ 1. Every `(`..`)` or `;` that appears is an error.
+ 2. Every `:`, `::`, `:::`, ... is an error, except in context of `Block`s as described below.
+ 3. Every `,` that appears is discarded.
 4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error]
 5. Every `Record` with no values in it is an error.
- 6. Every `Block` must contain triplets of `Value`, `Colons` (with a
-    single colon), `Value`. Any `Block` not following this pattern is an
+ 6. Every `Block` must contain zero or more repeating triplets of
+    `Expr`, `:`, `Expr`. Any `Block` not following this pattern is an
    error. Each `Block` following the pattern is translated to a
-    `Dictionary` containing a key/value pair for each triplet.
+    `Dictionary` containing a key/value pair for each triplet. Any
+    `Block` with duplicate keys (under interpretation) is an error.
+ 7. Every `Set` containing any duplicate expressions (under interpretation) is an error.

 [^discard-trailers-instead-of-error]: **Implementation note.** When
    implementing parsing of P-expressions into Preserves, consider
@ -168,7 +146,7 @@ classes mentioned above.
 Examples are given as pairs of P-expressions and their Preserves
 text-syntax encodings.

-### Individual P-expression `Value`s
+### Individual P-expression `Expr`s

 ```preserves
 ⌜<date 1821 (lookup-month "February") 3>⌝
@ -203,19 +181,27 @@ text-syntax encodings.
      tearDown();
  }⌝
 = <b
-      setUp <g> <s |;|>
+      setUp <g> <p |;|>
      # Now enter the loop
-      loop <s |:|> <b
-          greet <g "World"> <s |;|>
+      loop <p |:|> <b
+          greet <g "World"> <p |;|>
      >
-      tearDown <g> <s |;|>
+      tearDown <g> <p |;|>
  >
 ```

 ```preserves
 ⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝
-= [1 + 2.0 <s |,|> print "Hello" <s |,|> predicate <s |:|> #t <s |,|>
-   foo <s |,|> #!remote <s |,|> bar]
+= [1 + 2.0 <p |,|> print "Hello" <p |,|> predicate <p |:|> #t <p |,|>
+   foo <p |,|> #!remote <p |,|> bar]
+```
+
+```preserves
+ ⌜#{1 2 3}⌝
+= <s 1 2 3>
+
+ ⌜#{(read) (read) (read)}⌝
+= <s <g read> <g read> <g read>>
 ```

 ```preserves
@ -224,8 +210,8 @@ text-syntax encodings.
      address: Address,
  }⌝
 = <b
-      optional name <s |:|> string <s |,|>
-      address <s |:|> Address <s |,|>
+      optional name <p |:|> string <p |,|>
+      address <p |:|> Address <p |,|>
  >
 ```

@ -238,7 +224,7 @@ text-syntax encodings.
  }
  # example of a comment at the end of the input file⌝
 = [ <b
-        key <s |:|> value
+        key <p |:|> value
        @"example of a comment at the end of a dictionary" <a>
    >
    @"example of a comment at the end of the input file"
@ -273,7 +259,7 @@ generic P-expression reader can then feed into special-purpose
 program, and the parser refines this.

 Often, a parser will wish to extract structure from sequences of
-P-expression `Value`s.
+P-expression `Expr`s.

 - A simple technique is repeated splitting of sequences; first by
   `Semicolon`, then by `Comma`, then by increasingly high binding-power
@ -286,10 +272,39 @@ P-expression `Value`s.
   to build a parse tree using an extensible specification of the pre-,
   in-, and postfix operators involved.

- - Finally, if you treat sequences of `Value`s as pre-lexed token
+ - Finally, if you treat sequences of `Expr`s as pre-lexed token
   streams, almost any parsing formalism (such as [PEG
   parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),
   [Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to
   extract further syntactic structure.

+## Appendix: Equations for interpreting P-expressions as Preserves
+
+The partial function **uncomma**(*p*) removes all occurrences of `,`
+from a P-expression *p*.
+
+{:.pseudocode.equations}
+| **uncomma** : **Expr**      | ⇀ | **Expr**                             |                                       |
+| **uncomma**(`[`*p* ...`]`)  | = | `[`**uncomma**(*p*) ...`]`           | omitting any *p* = `,`                |
+| **uncomma**(`<`*p* ...`>`)  | = | `<`**uncomma**(*p*) ...`>`           | omitting any *p* = `,`                |
+| **uncomma**(`{`*p* ...`}`)  | = | `{`**uncomma**(*p*) ...`}`           | omitting any *p* = `,`                |
+| **uncomma**(`(`*p* ...`)`)  | = | `(`**uncomma**(*p*) ...`)`           | omitting any *p* = `,`                |
+| **uncomma**(`#{`*p* ...`}`) | = | `#{`**uncomma**(*p*) ...`}`          | omitting any *p* = `,`                |
+| **uncomma**(`#!`*p*)        | = | `#!`**uncomma**(*p*) ...`}`          |                                       |
+| **uncomma**(`@`*p* *q*)     | = | `@`**uncomma**(*p*) **uncomma**(*q*) |                                       |
+| **uncomma**(*p*)            | = | *p*                                  | if *p* ∈ **Atom** ∪ **Punct** - {`,`} |
+
+We write ⌞**uncomma**(*p*)⌟ for the partial function mapping a
+P-expression *p* ∈ `Expr` to a corresponding Preserves `Value`.
+
+{:.pseudocode.equations}
+| ⌞·⌟ : **Expr**        | ⇀ | **Value**               |                               |
+| ⌞`[`*p* ...`]`⌟       | = | `[`⌞*p*⌟ ...`]`         |                               |
+| ⌞`<`ℓ *p* ...`>`⌟     | = | `<`⌞ℓ⌟ ⌞*p*⌟ ...`>`     |                               |
+| ⌞`{`*k*`:`*v* ...`}`⌟ | = | `{`⌞*k*⌟`:`⌞*v*⌟ ...`}` | if all ⌞*k*⌟ ... are distinct |
+| ⌞`#{`*p* ...`}`⌟      | = | `#{`⌞*p*⌟ ...`}`        | if all ⌞*p*⌟ ... are distinct |
+| ⌞`#!`*p*⌟             | = | `#!`⌞*p*⌟               |                               |
+| ⌞`@`*p* *q*⌟          | = | `@`⌞*p*⌟ ⌞*q*⌟          |                               |
+| ⌞*p*⌟                 | = | *p*                     | when *p* ∈ **Atom**           |
+
 ## Notes