This is way better

2023-11-01 13:06:15 +01:00 · 2023-11-01 13:06:15 +01:00 · d7b983e140
parent 1a0772d39f
commit d7b983e140
2 changed files with 145 additions and 108 deletions
--- a/_includes/cheatsheet-pexprs-plaintext.md
+++ b/_includes/cheatsheet-pexprs-plaintext.md
@ -0,0 +1,22 @@
 The definition of `Atom` is as given in the Preserves text syntax.
 ```text
 Document      :=  Expr* sp
 Expr          :=  sp (Atom | Compound | Punct | Embedded | Annotated)
 Compound      :=  Sequence | Record | Block | Group | Set
 Punct         :=  `,` | `;` | `:`+
 sp            :=  (space | tab | cr | lf)*
 Sequence      :=   `[` Expr* Trailer sp `]`
 Record        :=   `<` Expr* Trailer sp `>`
 Block         :=   `{` Expr* Trailer sp `}`
 Group         :=   `(` Expr* Trailer sp `)`
 Set           :=  `#{` Expr* Trailer sp `}`
 Trailer       :=  Annotation*
 Embedded      :=  `#!` Expr
 Annotated     :=  Annotation Expr
 Annotation    :=  `@` Expr | `#` ((space | tab) linecomment) (cr | lf)
 ```
--- a/preserves-expressions.md
+++ b/preserves-expressions.md
@ -3,128 +3,105 @@ title: "P-expressions"
 ---
 Tony Garnock-Jones <tonyg@leastfixedpoint.com>  
-October 2023. Version 0.2.0.
+October 2023. Version 0.3.0.
 [text syntax]: preserves-text.html
 This document defines a grammar called *Preserves Expressions*
 (*P-expressions*, *pexprs*) that includes [ordinary Preserves text
-syntax](preserves-text.html) but offers extensions sufficient to support
+syntax][text syntax] but offers extensions sufficient to support a Lisp-
-a Lisp- or Haskell-like programming notation.
+or Haskell-like programming notation.
-**Motivation.** The [text syntax](preserves-text.html) for Preserves
+**Motivation.** The [text syntax][] for Preserves works well for writing
-works well for writing `Value`s, i.e. data. However, in some contexts,
+`Value`s, i.e. data. However, in some contexts, Preserves applications
-Preserves applications need a broader grammar that allows interleaving
+need a broader grammar that allows interleaving of *expressions* with
-of *expressions* with data. Two examples are the [Preserves Schema
+data. Two examples are the [Preserves Schema
 language](preserves-schema.html) and the [Synit configuration scripting
 language](https://synit.org/book/operation/scripting.html), both of
 which (ab)use Preserves text syntax as a kind of programming notation.
 ## Preliminaries
-The P-expression grammar takes the text syntax grammar as its base and
+The P-expression grammar includes by reference the definition of `Atom` from the
-modifies it.
+[text syntax][], as well as the definitions that `Atom` depends on.
 P-expressions take their own approach to inter-token whitespace,
 however.
 <a id="whitespace">
-**Whitespace.** Whitespace is redefined as any number of spaces, tabs,
+**Whitespace.** Whitespace `sp` is defined as any number of spaces,
-carriage returns, or line feeds. Commas are *not* considered whitespace
+tabs, carriage returns, or line feeds. Commas are *not* considered
-in P-expressions.
+whitespace in P-expressions, and so class `sp` is different to class
 `ws` from the text syntax.
-                ws = *(%x20 / %x09 / CR / LF)
+                sp = *(%x20 / %x09 / CR / LF)
-<a id="delimiters"></a>
+No changes to [the Preserves semantic model](preserves.html) are made.
-**Delimiters.** Because commas are no longer included in class `ws`,
+Every Preserves text-syntax term can be parsed as a valid P-expression,
-class `delimiter` is widened to include them explicitly.
+but in general P-expressions must be rewritten or otherwise interpreted
-
+before a meaningful Preserves value can be arrived at ([see
-         delimiter = ws / ","
+below](#reading-preserves)).
                   / "<" / ">" / "[" / "]" / "{" / "}"
                   / "#" / ":" / DQUOTE / "|" / "@" / ";"
 ## Grammar
-P-expressions add comma, semicolon, and sequences of one or more colons
+Standalone documents containing P-expressions are sequences of
-to the syntax class `Value`.
+individual `Expr`s, followed by trailing whitespace.
-            Value =/ Comma / Semicolon / Colons
+          Document = *Expr sp
             Comma = ","
         Semicolon = ";"
            Colons = 1*":"
-Now that colon is in `Value`, the syntax for `Dictionary` is replaced
+A single P-expression `Expr` can be an `Atom` from the [text syntax][],
-with `Block` everywhere it is mentioned.
+a compound expression, special punctuation, an `Embedded` expression, or
 an `Annotated` expression.
-             Block = "{" *Value ws "}"
+              Expr = sp (Atom | Compound | Punct | Embedded | Annotated)
-Syntax for `Record` is loosened to allow empty angle brackets.
+Embedded and annotated values are as in the text syntax, differing only
 in that uses of `Value` are replaced with `Expr`.
-            Record = "<" *Value ws ">"
+           Embedded = "#!" Expr
          Annotated = Annotation Expr
         Annotation = "@" Expr / "#" [(%x20 / %x09) linecomment] (CR / LF)
-New syntax for explicit uninterpreted grouping of sequences of values is
+P-expression special punctuation marks are comma, semicolon, and sequences of one or more colons.
 introduced, and added to class `Value`.
-            Value =/ ws Group
+             Punct = "," / ";" / 1*":"
             Group = "(" *Value ws ")"
-Finally, class `Document` is replaced in order to allow standalone
+Compound expressions are sequences of `Expr`s with optional trailing
-documents to directly comprise a sequence of multiple values.
+`Annotation`s, surrounded by various kinds of parentheses.
-          Document = *Value ws
+          Compound = Sequence / Record / Block / Group / Set
          Sequence =  "[" *Expr Trailer sp "]"
            Record =  "<" *Expr Trailer sp ">"
             Block =  "{" *Expr Trailer sp "}"
             Group =  "(" *Expr Trailer sp ")"
               Set = "#{" *Expr Trailer sp "}"
-No changes to [the Preserves semantic model](preserves.html) are made.
+In an `Annotated` P-expression, annotations and comments attach to the
-Every Preserves text-syntax term is a valid P-expression, but in general
+term following them, just as in the ordinary text syntax. However, it is
-P-expressions must be rewritten or otherwise interpreted before a
+common in programming notations to allow comments at the end of a file
-meaningful Preserves value can be arrived at ([see
+or other sequential construct. The ordinary text syntax forbids comments
-below](#reading-preserves)).
+in these positions, but P-expressions allow them.
-## <a id="annotations"></a>Annotations and Comments
+           Trailer = *Annotation
 Annotations and comments attach to the term following them, just as in
 the ordinary text syntax. However, it is common in programming notations
 to allow comments at the end of a file or other sequential construct:
    {
        key: value
        # example of a comment at the end of a dictionary
    }
    # example of a comment at the end of the input file
 While the ordinary text syntax forbids comments in these positions,
 P-expressions allow them:
         Document =/ *Value Trailer ws
           Record =/  "<" *Value Trailer ws ">"
         Sequence =/  "[" *Value Trailer ws "]"
              Set =/ "#{" *Value Trailer ws "}"
            Block =/  "{" *Value Trailer ws "}"
            Group =/  "(" *Value Trailer ws ")"
           Trailer = 1*Annotation
 ## <a id="encoding-pexprs"></a>Encoding P-expressions as Preserves
 We write ⌜*p*⌝ for the encoding into Preserves of P-expression *p*.
 {:.pseudocode.equations}
-| ⌜·⌝ : **P-expression** | ⟶ | **Preserves** |
+| ⌜·⌝ : **Expr**   | ⟶ | **Value** |
-
+| ⌜`[`*p* ...`]`⌝  | = | `[`⌜*p*⌝ ...`]`         |
-Aside from `Group`, `Block`, `Comma`, `Semicolon`, `Colons`, `Trailer`,
+| ⌜`<`*p* ...`>`⌝  | = | `<r` ⌜*p*⌝ ...`>`       |
-and `Record`, P-expressions are encoded directly as Preserves data.
+| ⌜`{`*p* ...`}`⌝  | = | `<b` ⌜*p*⌝ ...`>`       |
-
+| ⌜`(`*p* ...`)`⌝  | = | `<g` ⌜*p*⌝ ...`>`       |
-{:.pseudocode.equations}
+| ⌜`#{`*p* ...`}`⌝ | = | `<s `⌜*p*⌝ ...`>`       |
-| ⌜`[`*p* ...`]`⌝  | = | `[`⌜*p*⌝ ...`]`             |
+| ⌜`#!`*p*⌝        | = | `#!`⌜*p*⌝               |
-| ⌜`#{`*p* ...`}`⌝ | = | `#{`⌜*p*⌝ ...`}`            |
+| ⌜`@`*p* *q*⌝     | = | `@`⌜*p*⌝ ⌜*q*⌝          |
-| ⌜`#!`*p*⌝        | = | `#!`⌜*p*⌝                   |
+| ⌜*p*⌝            | = | *p* | when *p* ∈ **Atom** |
-| ⌜`@`*p* *q*⌝     | = | `@`⌜*p*⌝ ⌜*q*⌝              |
+| ⌜`,`⌝            | = | `<p |,|>`               |
-| ⌜*p*⌝            | = | *p* when *p* ∈ **Atom** |
+| ⌜`;`⌝            | = | `<p |;|>`               |
-
+| ⌜`:` ...⌝        | = | `<p |:` ...`|>`         |
-Everything else is encoded as Preserves records.
+| ⌜*t*⌝            | = | ⌜*a*⌝ ... `<a>` | where *a* ... are the annotations in *t* and *t* ∈ **Trailer** |
 {:.pseudocode.equations}
 | ⌜`<`*p* ...`>`⌝ | = | `<r` ⌜*p*⌝ ...`>` |
 | ⌜`(`*p* ...`)`⌝ | = | `<g` ⌜*p*⌝ ...`>` |
 | ⌜`{`*p* ...`}`⌝ | = | `<b` ⌜*p*⌝ ...`>` |
 | ⌜`,`⌝           | = | `<s |,|>`         |
 | ⌜`;`⌝           | = | `<s |;|>`         |
 | ⌜`:` ...⌝       | = | `<s |:` ...`|>`   |
 | ⌜*t*⌝           | = | ⌜*a*⌝ ... `<a>`, where *a* ... are the annotations in *t* and *t* ∈ **Trailer** |
 The record `<a>` acts as an anchor for the annotations in a `Trailer`.
@ -145,18 +122,19 @@ using Preserves text syntax and then (3) read as a P-expression can be
 A reader for P-expressions can be adapted to yield a reader for
 Preserves terms by processing (subterms of) each P-expression that the
-reader produces. The only subterms that need processing are the special
+reader produces.
 classes mentioned above.
- 1. Every `Group` or `Semicolon` that appears is an error.
+ 1. Every `(`..`)` or `;` that appears is an error.
- 2. Every `Colons` with two or more colons in it is an error.
+ 2. Every `:`, `::`, `:::`, ... is an error, except in context of `Block`s as described below.
- 3. Every `Comma` that appears is discarded.
+ 3. Every `,` that appears is discarded.
 4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error]
 5. Every `Record` with no values in it is an error.
- 6. Every `Block` must contain triplets of `Value`, `Colons` (with a
+ 6. Every `Block` must contain zero or more repeating triplets of
-    single colon), `Value`. Any `Block` not following this pattern is an
+    `Expr`, `:`, `Expr`. Any `Block` not following this pattern is an
    error. Each `Block` following the pattern is translated to a
-    `Dictionary` containing a key/value pair for each triplet.
+    `Dictionary` containing a key/value pair for each triplet. Any
    `Block` with duplicate keys (under interpretation) is an error.
 7. Every `Set` containing any duplicate expressions (under interpretation) is an error.
 [^discard-trailers-instead-of-error]: **Implementation note.** When
    implementing parsing of P-expressions into Preserves, consider
@ -168,7 +146,7 @@ classes mentioned above.
 Examples are given as pairs of P-expressions and their Preserves
 text-syntax encodings.
-### Individual P-expression `Value`s
+### Individual P-expression `Expr`s
 ```preserves
 ⌜<date 1821 (lookup-month "February") 3>⌝
@ -203,19 +181,27 @@ text-syntax encodings.
      tearDown();
  }⌝
 = <b
-      setUp <g> <s |;|>
+      setUp <g> <p |;|>
      # Now enter the loop
-      loop <s |:|> <b
+      loop <p |:|> <b
-          greet <g "World"> <s |;|>
+          greet <g "World"> <p |;|>
      >
-      tearDown <g> <s |;|>
+      tearDown <g> <p |;|>
  >
 ```
 ```preserves
 ⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝
-= [1 + 2.0 <s |,|> print "Hello" <s |,|> predicate <s |:|> #t <s |,|>
+= [1 + 2.0 <p |,|> print "Hello" <p |,|> predicate <p |:|> #t <p |,|>
-   foo <s |,|> #!remote <s |,|> bar]
+   foo <p |,|> #!remote <p |,|> bar]
 ```
 ```preserves
 ⌜#{1 2 3}⌝
 = <s 1 2 3>
 ⌜#{(read) (read) (read)}⌝
 = <s <g read> <g read> <g read>>
 ```
 ```preserves
@ -224,8 +210,8 @@ text-syntax encodings.
      address: Address,
  }⌝
 = <b
-      optional name <s |:|> string <s |,|>
+      optional name <p |:|> string <p |,|>
-      address <s |:|> Address <s |,|>
+      address <p |:|> Address <p |,|>
  >
 ```
@ -238,7 +224,7 @@ text-syntax encodings.
  }
  # example of a comment at the end of the input file⌝
 = [ <b
-        key <s |:|> value
+        key <p |:|> value
        @"example of a comment at the end of a dictionary" <a>
    >
    @"example of a comment at the end of the input file"
@ -273,7 +259,7 @@ generic P-expression reader can then feed into special-purpose
 program, and the parser refines this.
 Often, a parser will wish to extract structure from sequences of
-P-expression `Value`s.
+P-expression `Expr`s.
 - A simple technique is repeated splitting of sequences; first by
   `Semicolon`, then by `Comma`, then by increasingly high binding-power
@ -286,10 +272,39 @@ P-expression `Value`s.
   to build a parse tree using an extensible specification of the pre-,
   in-, and postfix operators involved.
- - Finally, if you treat sequences of `Value`s as pre-lexed token
+ - Finally, if you treat sequences of `Expr`s as pre-lexed token
   streams, almost any parsing formalism (such as [PEG
   parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),
   [Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to
   extract further syntactic structure.
 ## Appendix: Equations for interpreting P-expressions as Preserves
 The partial function **uncomma**(*p*) removes all occurrences of `,`
 from a P-expression *p*.
 {:.pseudocode.equations}
 | **uncomma** : **Expr**      | ⇀ | **Expr**                             |                                       |
 | **uncomma**(`[`*p* ...`]`)  | = | `[`**uncomma**(*p*) ...`]`           | omitting any *p* = `,`                |
 | **uncomma**(`<`*p* ...`>`)  | = | `<`**uncomma**(*p*) ...`>`           | omitting any *p* = `,`                |
 | **uncomma**(`{`*p* ...`}`)  | = | `{`**uncomma**(*p*) ...`}`           | omitting any *p* = `,`                |
 | **uncomma**(`(`*p* ...`)`)  | = | `(`**uncomma**(*p*) ...`)`           | omitting any *p* = `,`                |
 | **uncomma**(`#{`*p* ...`}`) | = | `#{`**uncomma**(*p*) ...`}`          | omitting any *p* = `,`                |
 | **uncomma**(`#!`*p*)        | = | `#!`**uncomma**(*p*) ...`}`          |                                       |
 | **uncomma**(`@`*p* *q*)     | = | `@`**uncomma**(*p*) **uncomma**(*q*) |                                       |
 | **uncomma**(*p*)            | = | *p*                                  | if *p* ∈ **Atom** ∪ **Punct** - {`,`} |
 We write ⌞**uncomma**(*p*)⌟ for the partial function mapping a
 P-expression *p* ∈ `Expr` to a corresponding Preserves `Value`.
 {:.pseudocode.equations}
 | ⌞·⌟ : **Expr**        | ⇀ | **Value**               |                               |
 | ⌞`[`*p* ...`]`⌟       | = | `[`⌞*p*⌟ ...`]`         |                               |
 | ⌞`<`ℓ *p* ...`>`⌟     | = | `<`⌞ℓ⌟ ⌞*p*⌟ ...`>`     |                               |
 | ⌞`{`*k*`:`*v* ...`}`⌟ | = | `{`⌞*k*⌟`:`⌞*v*⌟ ...`}` | if all ⌞*k*⌟ ... are distinct |
 | ⌞`#{`*p* ...`}`⌟      | = | `#{`⌞*p*⌟ ...`}`        | if all ⌞*p*⌟ ... are distinct |
 | ⌞`#!`*p*⌟             | = | `#!`⌞*p*⌟               |                               |
 | ⌞`@`*p* *q*⌟          | = | `@`⌞*p*⌟ ⌞*q*⌟          |                               |
 | ⌞*p*⌟                 | = | *p*                     | when *p* ∈ **Atom**           |
 ## Notes