diff --git a/_includes/cheatsheet-pexprs-plaintext.md b/_includes/cheatsheet-pexprs-plaintext.md new file mode 100644 index 0000000..13c7108 --- /dev/null +++ b/_includes/cheatsheet-pexprs-plaintext.md @@ -0,0 +1,22 @@ +The definition of `Atom` is as given in the Preserves text syntax. + +```text +Document := Expr* sp +Expr := sp (Atom | Compound | Punct | Embedded | Annotated) +Compound := Sequence | Record | Block | Group | Set +Punct := `,` | `;` | `:`+ + +sp := (space | tab | cr | lf)* + +Sequence := `[` Expr* Trailer sp `]` +Record := `<` Expr* Trailer sp `>` +Block := `{` Expr* Trailer sp `}` +Group := `(` Expr* Trailer sp `)` +Set := `#{` Expr* Trailer sp `}` + +Trailer := Annotation* + +Embedded := `#!` Expr +Annotated := Annotation Expr +Annotation := `@` Expr | `#` ((space | tab) linecomment) (cr | lf) +``` diff --git a/preserves-expressions.md b/preserves-expressions.md index dccc68c..ffefe97 100644 --- a/preserves-expressions.md +++ b/preserves-expressions.md @@ -3,128 +3,105 @@ title: "P-expressions" --- Tony Garnock-Jones -October 2023. Version 0.2.0. +October 2023. Version 0.3.0. + +[text syntax]: preserves-text.html This document defines a grammar called *Preserves Expressions* (*P-expressions*, *pexprs*) that includes [ordinary Preserves text -syntax](preserves-text.html) but offers extensions sufficient to support -a Lisp- or Haskell-like programming notation. +syntax][text syntax] but offers extensions sufficient to support a Lisp- +or Haskell-like programming notation. -**Motivation.** The [text syntax](preserves-text.html) for Preserves -works well for writing `Value`s, i.e. data. However, in some contexts, -Preserves applications need a broader grammar that allows interleaving -of *expressions* with data. Two examples are the [Preserves Schema +**Motivation.** The [text syntax][] for Preserves works well for writing +`Value`s, i.e. data. However, in some contexts, Preserves applications +need a broader grammar that allows interleaving of *expressions* with +data. Two examples are the [Preserves Schema language](preserves-schema.html) and the [Synit configuration scripting language](https://synit.org/book/operation/scripting.html), both of which (ab)use Preserves text syntax as a kind of programming notation. ## Preliminaries -The P-expression grammar takes the text syntax grammar as its base and -modifies it. +The P-expression grammar includes by reference the definition of `Atom` from the +[text syntax][], as well as the definitions that `Atom` depends on. + +P-expressions take their own approach to inter-token whitespace, +however. -**Whitespace.** Whitespace is redefined as any number of spaces, tabs, -carriage returns, or line feeds. Commas are *not* considered whitespace -in P-expressions. +**Whitespace.** Whitespace `sp` is defined as any number of spaces, +tabs, carriage returns, or line feeds. Commas are *not* considered +whitespace in P-expressions, and so class `sp` is different to class +`ws` from the text syntax. - ws = *(%x20 / %x09 / CR / LF) + sp = *(%x20 / %x09 / CR / LF) - -**Delimiters.** Because commas are no longer included in class `ws`, -class `delimiter` is widened to include them explicitly. - - delimiter = ws / "," - / "<" / ">" / "[" / "]" / "{" / "}" - / "#" / ":" / DQUOTE / "|" / "@" / ";" +No changes to [the Preserves semantic model](preserves.html) are made. +Every Preserves text-syntax term can be parsed as a valid P-expression, +but in general P-expressions must be rewritten or otherwise interpreted +before a meaningful Preserves value can be arrived at ([see +below](#reading-preserves)). ## Grammar -P-expressions add comma, semicolon, and sequences of one or more colons -to the syntax class `Value`. +Standalone documents containing P-expressions are sequences of +individual `Expr`s, followed by trailing whitespace. - Value =/ Comma / Semicolon / Colons - Comma = "," - Semicolon = ";" - Colons = 1*":" + Document = *Expr sp -Now that colon is in `Value`, the syntax for `Dictionary` is replaced -with `Block` everywhere it is mentioned. +A single P-expression `Expr` can be an `Atom` from the [text syntax][], +a compound expression, special punctuation, an `Embedded` expression, or +an `Annotated` expression. - Block = "{" *Value ws "}" + Expr = sp (Atom | Compound | Punct | Embedded | Annotated) -Syntax for `Record` is loosened to allow empty angle brackets. +Embedded and annotated values are as in the text syntax, differing only +in that uses of `Value` are replaced with `Expr`. - Record = "<" *Value ws ">" + Embedded = "#!" Expr + Annotated = Annotation Expr + Annotation = "@" Expr / "#" [(%x20 / %x09) linecomment] (CR / LF) -New syntax for explicit uninterpreted grouping of sequences of values is -introduced, and added to class `Value`. +P-expression special punctuation marks are comma, semicolon, and sequences of one or more colons. - Value =/ ws Group - Group = "(" *Value ws ")" + Punct = "," / ";" / 1*":" -Finally, class `Document` is replaced in order to allow standalone -documents to directly comprise a sequence of multiple values. +Compound expressions are sequences of `Expr`s with optional trailing +`Annotation`s, surrounded by various kinds of parentheses. - Document = *Value ws + Compound = Sequence / Record / Block / Group / Set + Sequence = "[" *Expr Trailer sp "]" + Record = "<" *Expr Trailer sp ">" + Block = "{" *Expr Trailer sp "}" + Group = "(" *Expr Trailer sp ")" + Set = "#{" *Expr Trailer sp "}" -No changes to [the Preserves semantic model](preserves.html) are made. -Every Preserves text-syntax term is a valid P-expression, but in general -P-expressions must be rewritten or otherwise interpreted before a -meaningful Preserves value can be arrived at ([see -below](#reading-preserves)). +In an `Annotated` P-expression, annotations and comments attach to the +term following them, just as in the ordinary text syntax. However, it is +common in programming notations to allow comments at the end of a file +or other sequential construct. The ordinary text syntax forbids comments +in these positions, but P-expressions allow them. -## Annotations and Comments - -Annotations and comments attach to the term following them, just as in -the ordinary text syntax. However, it is common in programming notations -to allow comments at the end of a file or other sequential construct: - - { - key: value - # example of a comment at the end of a dictionary - } - # example of a comment at the end of the input file - -While the ordinary text syntax forbids comments in these positions, -P-expressions allow them: - - Document =/ *Value Trailer ws - Record =/ "<" *Value Trailer ws ">" - Sequence =/ "[" *Value Trailer ws "]" - Set =/ "#{" *Value Trailer ws "}" - Block =/ "{" *Value Trailer ws "}" - Group =/ "(" *Value Trailer ws ")" - - Trailer = 1*Annotation + Trailer = *Annotation ## Encoding P-expressions as Preserves We write ⌜*p*⌝ for the encoding into Preserves of P-expression *p*. {:.pseudocode.equations} -| ⌜·⌝ : **P-expression** | ⟶ | **Preserves** | - -Aside from `Group`, `Block`, `Comma`, `Semicolon`, `Colons`, `Trailer`, -and `Record`, P-expressions are encoded directly as Preserves data. - -{:.pseudocode.equations} -| ⌜`[`*p* ...`]`⌝ | = | `[`⌜*p*⌝ ...`]` | -| ⌜`#{`*p* ...`}`⌝ | = | `#{`⌜*p*⌝ ...`}` | -| ⌜`#!`*p*⌝ | = | `#!`⌜*p*⌝ | -| ⌜`@`*p* *q*⌝ | = | `@`⌜*p*⌝ ⌜*q*⌝ | -| ⌜*p*⌝ | = | *p* when *p* ∈ **Atom** | - -Everything else is encoded as Preserves records. - -{:.pseudocode.equations} -| ⌜`<`*p* ...`>`⌝ | = | `` | -| ⌜`(`*p* ...`)`⌝ | = | `` | -| ⌜`{`*p* ...`}`⌝ | = | `` | -| ⌜`,`⌝ | = | `` | -| ⌜`;`⌝ | = | `` | -| ⌜`:` ...⌝ | = | `` | -| ⌜*t*⌝ | = | ⌜*a*⌝ ... ``, where *a* ... are the annotations in *t* and *t* ∈ **Trailer** | +| ⌜·⌝ : **Expr** | ⟶ | **Value** | +| ⌜`[`*p* ...`]`⌝ | = | `[`⌜*p*⌝ ...`]` | +| ⌜`<`*p* ...`>`⌝ | = | `` | +| ⌜`{`*p* ...`}`⌝ | = | `` | +| ⌜`(`*p* ...`)`⌝ | = | `` | +| ⌜`#{`*p* ...`}`⌝ | = | `` | +| ⌜`#!`*p*⌝ | = | `#!`⌜*p*⌝ | +| ⌜`@`*p* *q*⌝ | = | `@`⌜*p*⌝ ⌜*q*⌝ | +| ⌜*p*⌝ | = | *p* | when *p* ∈ **Atom** | +| ⌜`,`⌝ | = | `

` | +| ⌜`;`⌝ | = | `

` | +| ⌜`:` ...⌝ | = | `

` | +| ⌜*t*⌝ | = | ⌜*a*⌝ ... `` | where *a* ... are the annotations in *t* and *t* ∈ **Trailer** | The record `` acts as an anchor for the annotations in a `Trailer`. @@ -145,18 +122,19 @@ using Preserves text syntax and then (3) read as a P-expression can be A reader for P-expressions can be adapted to yield a reader for Preserves terms by processing (subterms of) each P-expression that the -reader produces. The only subterms that need processing are the special -classes mentioned above. +reader produces. - 1. Every `Group` or `Semicolon` that appears is an error. - 2. Every `Colons` with two or more colons in it is an error. - 3. Every `Comma` that appears is discarded. + 1. Every `(`..`)` or `;` that appears is an error. + 2. Every `:`, `::`, `:::`, ... is an error, except in context of `Block`s as described below. + 3. Every `,` that appears is discarded. 4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error] 5. Every `Record` with no values in it is an error. - 6. Every `Block` must contain triplets of `Value`, `Colons` (with a - single colon), `Value`. Any `Block` not following this pattern is an + 6. Every `Block` must contain zero or more repeating triplets of + `Expr`, `:`, `Expr`. Any `Block` not following this pattern is an error. Each `Block` following the pattern is translated to a - `Dictionary` containing a key/value pair for each triplet. + `Dictionary` containing a key/value pair for each triplet. Any + `Block` with duplicate keys (under interpretation) is an error. + 7. Every `Set` containing any duplicate expressions (under interpretation) is an error. [^discard-trailers-instead-of-error]: **Implementation note.** When implementing parsing of P-expressions into Preserves, consider @@ -168,7 +146,7 @@ classes mentioned above. Examples are given as pairs of P-expressions and their Preserves text-syntax encodings. -### Individual P-expression `Value`s +### Individual P-expression `Expr`s ```preserves ⌜⌝ @@ -203,19 +181,27 @@ text-syntax encodings. tearDown(); }⌝ = + setUp

# Now enter the loop - loop + loop

> - tearDown + tearDown

> ``` ```preserves ⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝ -= [1 + 2.0 print "Hello" predicate #t - foo #!remote bar] += [1 + 2.0

print "Hello"

predicate

#t

+ foo

#!remote

bar] +``` + +```preserves + ⌜#{1 2 3}⌝ += + + ⌜#{(read) (read) (read)}⌝ += > ``` ```preserves @@ -224,8 +210,8 @@ text-syntax encodings. address: Address, }⌝ = string - address Address + optional name

string

+ address

Address

> ``` @@ -238,7 +224,7 @@ text-syntax encodings. } # example of a comment at the end of the input file⌝ = [ value + key

value @"example of a comment at the end of a dictionary" > @"example of a comment at the end of the input file" @@ -273,7 +259,7 @@ generic P-expression reader can then feed into special-purpose program, and the parser refines this. Often, a parser will wish to extract structure from sequences of -P-expression `Value`s. +P-expression `Expr`s. - A simple technique is repeated splitting of sequences; first by `Semicolon`, then by `Comma`, then by increasingly high binding-power @@ -286,10 +272,39 @@ P-expression `Value`s. to build a parse tree using an extensible specification of the pre-, in-, and postfix operators involved. - - Finally, if you treat sequences of `Value`s as pre-lexed token + - Finally, if you treat sequences of `Expr`s as pre-lexed token streams, almost any parsing formalism (such as [PEG parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar), [Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to extract further syntactic structure. +## Appendix: Equations for interpreting P-expressions as Preserves + +The partial function **uncomma**(*p*) removes all occurrences of `,` +from a P-expression *p*. + +{:.pseudocode.equations} +| **uncomma** : **Expr** | ⇀ | **Expr** | | +| **uncomma**(`[`*p* ...`]`) | = | `[`**uncomma**(*p*) ...`]` | omitting any *p* = `,` | +| **uncomma**(`<`*p* ...`>`) | = | `<`**uncomma**(*p*) ...`>` | omitting any *p* = `,` | +| **uncomma**(`{`*p* ...`}`) | = | `{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` | +| **uncomma**(`(`*p* ...`)`) | = | `(`**uncomma**(*p*) ...`)` | omitting any *p* = `,` | +| **uncomma**(`#{`*p* ...`}`) | = | `#{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` | +| **uncomma**(`#!`*p*) | = | `#!`**uncomma**(*p*) ...`}` | | +| **uncomma**(`@`*p* *q*) | = | `@`**uncomma**(*p*) **uncomma**(*q*) | | +| **uncomma**(*p*) | = | *p* | if *p* ∈ **Atom** ∪ **Punct** - {`,`} | + +We write ⌞**uncomma**(*p*)⌟ for the partial function mapping a +P-expression *p* ∈ `Expr` to a corresponding Preserves `Value`. + +{:.pseudocode.equations} +| ⌞·⌟ : **Expr** | ⇀ | **Value** | | +| ⌞`[`*p* ...`]`⌟ | = | `[`⌞*p*⌟ ...`]` | | +| ⌞`<`ℓ *p* ...`>`⌟ | = | `<`⌞ℓ⌟ ⌞*p*⌟ ...`>` | | +| ⌞`{`*k*`:`*v* ...`}`⌟ | = | `{`⌞*k*⌟`:`⌞*v*⌟ ...`}` | if all ⌞*k*⌟ ... are distinct | +| ⌞`#{`*p* ...`}`⌟ | = | `#{`⌞*p*⌟ ...`}` | if all ⌞*p*⌟ ... are distinct | +| ⌞`#!`*p*⌟ | = | `#!`⌞*p*⌟ | | +| ⌞`@`*p* *q*⌟ | = | `@`⌞*p*⌟ ⌞*q*⌟ | | +| ⌞*p*⌟ | = | *p* | when *p* ∈ **Atom** | + ## Notes