--- title: "P-expressions" --- Tony Garnock-Jones February 2024. Version 0.3.1. [text syntax]: preserves-text.html **Experimental.** This document defines a grammar called *Preserves Expressions* (*P-expressions*, *pexprs*) that includes [ordinary Preserves text syntax][text syntax] but offers extensions sufficient to support a Lisp- or Haskell-like programming notation. **Motivation.** The [text syntax][] for Preserves works well for writing `Value`s, i.e. data. However, in some contexts, Preserves applications need a broader grammar that allows interleaving of *expressions* with data. Two examples are the [Preserves Schema language](preserves-schema.html) and the [Synit configuration scripting language](https://synit.org/book/operation/scripting.html), both of which (ab)use Preserves text syntax as a kind of programming notation. ## Preliminaries The P-expression grammar includes by reference the definition of `Atom` from the [text syntax][], as well as the definitions that `Atom` depends on. **Whitespace.** Whitespace `ws` is, as in the text syntax, defined as any number of spaces, tabs, carriage returns, or line feeds. ws = *(%x20 / %x09 / CR / LF) No changes to [the Preserves semantic model](preserves.html) are made. Every Preserves text-syntax term can be parsed as a valid P-expression, but in general P-expressions must be rewritten or otherwise interpreted before a meaningful Preserves value can be arrived at ([see below](#reading-preserves)). ## Grammar Standalone documents containing P-expressions are sequences of individual `Expr`s, followed by annotations, comments, and/or whitespace. Document = *Expr Trailer ws A single P-expression `Expr` can be an `Atom` from the [text syntax][], a compound expression, special punctuation, an `Embedded` expression, or an `Annotated` expression. The class `SimpleExpr` includes all of `Expr` except special punctuation. Expr = ws (SimpleExpr / Punct) SimpleExpr = Atom / Compound / Embedded / Annotated Embedded and annotated values are as in the text syntax, differing only in that uses of `Value` are replaced with `SimpleExpr`. Embedded = "#:" SimpleExpr Annotated = Annotation SimpleExpr Annotation = "@" SimpleExpr / "#" [(%x20 / %x09) linecomment] (CR / LF) linecomment = * P-expression special punctuation marks are comma, semicolon, and sequences of one or more colons.[^greedy-colons] Punct = "," / ";" / 1*":" [^greedy-colons]: Colon matching is greedy: when reading, all adjacent colons are always taken into a single token, and when writing, adjacent colon-sequence punctuation marks must be written with whitespace separating them. Compound expressions are sequences of `Expr`s with optional trailing `Annotation`s, surrounded by various kinds of parentheses. Compound = Sequence / Record / Block / Group / Set Sequence = "[" *Expr Trailer ws "]" Record = "<" *Expr Trailer ws ">" Block = "{" *Expr Trailer ws "}" Group = "(" *Expr Trailer ws ")" Set = "#{" *Expr Trailer ws "}" In an `Annotated` P-expression, annotations and comments attach to the term following them, just as in the ordinary text syntax. However, it is common in programming notations to allow comments at the end of a file or other sequential construct. The ordinary text syntax forbids comments in these positions, but P-expressions allow them. Trailer = *(ws Annotation) ## Encoding P-expressions as Preserves We write ⌜*p*⌝ for the encoding into Preserves of a P-expression *p*. {:.pseudocode.equations} | ⌜·⌝ : **Expr** | ⟶ | **Value** | | ⌜`[`*p* ...`]`⌝ | = | `[`⌜*p*⌝ ...`]` | | ⌜`<`*p* ...`>`⌝ | = | `` | | ⌜`{`*p* ...`}`⌝ | = | `` | | ⌜`(`*p* ...`)`⌝ | = | `` | | ⌜`#{`*p* ...`}`⌝ | = | `` | | ⌜`#:`*p*⌝ | = | `#:`⌜*p*⌝ | | ⌜`@`*p* *q*⌝ | = | `@`⌜*p*⌝ ⌜*q*⌝ | | ⌜*p*⌝ | = | *p* | when *p* ∈ **Atom** | | ⌜`,`⌝ | = | `

` | | ⌜`;`⌝ | = | `

` | | ⌜`:` ...⌝ | = | `

` | | ⌜*t*⌝ | = | `@`⌜*a*⌝ ... `` | where `@`*a* ... are the annotations in *t* and *t* ∈ **Trailer** | The record `` acts as an anchor for the annotations in a `Trailer`. We overload the ⌜·⌝ notation for encoding whole `Document`s into sequences of Preserves values. {:.pseudocode.equations} | ⌜·⌝ : **P-expression Document** | ⟶ | **Preserves Sequence** | | ⌜*p* ...⌝ | = | `[`⌜*p*⌝ ...`]` | | ⌜*p* ... `@`*a* ...⌝ | = | `[`⌜*p*⌝ ... `@`⌜*a*⌝ ... `]` | | | | where `@`*a* ... are trailing annotations | ## Interpreting P-expressions as Preserves The [previous section](#encoding-pexprs) discussed ways of representing P-expressions using Preserves. Here, we discuss *interpreting* P-expressions *as* Preserves so that (1) a Preserves datum (2) written using Preserves text syntax and then (3) read as a P-expression can be (4) interpreted from that P-expression to yield the original datum. 1. Every `(`..`)` or `;` that appears is an error. 2. Every `:`, `::`, `:::`, ... is an error, except in context of `Block`s as described below. 3. Every `,` that appears is discarded. 4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error] 5. Every `Record` with no values in it is an error. 6. Every `Block` must contain zero or more repeating triplets of `SimpleExpr`, `:`, `SimpleExpr`. Any `Block` not following this pattern is an error. Each `Block` following the pattern is translated to a `Dictionary` containing a key/value pair for each triplet. Any `Block` with duplicate keys (under interpretation) is an error. 7. Every `Set` containing any duplicate expressions (under interpretation) is an error. [^discard-trailers-instead-of-error]: **Implementation note.** When implementing parsing of P-expressions into Preserves, consider offering an optional mode where trailing annotations `Trailer` are *discarded* instead of causing an error to be signalled. ## Appendix: Examples Examples are given as pairs of P-expressions and their Preserves text-syntax encodings. ### Individual P-expression `Expr`s ```preserves ⌜⌝ = 3> ``` ```preserves ⌜<>⌝ = ``` ```preserves ⌜(begin (println! (+ 1 2)) (+ 3 4))⌝ = > > ``` ```preserves ⌜()⌝ = ⌜[() () ()]⌝ = [, , ] ``` ```preserves ⌜{ setUp(); # Now enter the loop loop: { greet("World"); } tearDown(); }⌝ =

# Now enter the loop loop

> tearDown

> ``` ```preserves ⌜[1 + 2.0, print "Hello", predicate: #t, foo, #:remote, bar]⌝ = [1 + 2.0

print "Hello"

predicate

#t

foo

#:remote

bar] ``` ```preserves ⌜#{1 2 3}⌝ = ⌜#{(read) (read) (read)}⌝ = > ``` ```preserves ⌜{ optional name: string, address: Address, }⌝ = string

address

Address

> ``` ### Whole `Document`s ```preserves ⌜{ key: value # example of a comment at the end of a dictionary } # example of a comment at the end of the input file⌝ = [ value @"example of a comment at the end of a dictionary" > @"example of a comment at the end of the input file" ] ``` ## Appendix: Reading vs. Parsing Lisp systems first *read* streams of bytes into S-expressions and then *parse* those S-expressions into more abstract structures denoting various kinds of program syntax. [Separation of reading from parsing is what gives Lisp its syntactic flexibility.](http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-point/) Similarly, the Apple programming language [Dylan](https://en.wikipedia.org/wiki/Dylan_(programming_language)) included a reader-parser split, with the Dylan reader producing *D-expressions* that are somewhat similar to P-expressions. Finally, the Racket dialects [Honu](https://docs.racket-lang.org/honu/index.html) and [Something](https://github.com/tonyg/racket-something) use a reader-parser-macro setup, where the reader produces Racket data, the parser produces "syntax" and is user-extensible, and Racket's own modular macro system rewrites this "syntax" down to core forms to be compiled to machine code. Similarly, when using P-expressions as the foundation for a language, a generic P-expression reader can then feed into special-purpose *parsers*. The reader captures the coarse syntactic structure of a program, and the parser refines this. Often, a parser will wish to extract structure from sequences of P-expression `Expr`s. - A simple technique is repeated splitting of sequences; first by `Semicolon`, then by `Comma`, then by increasingly high binding-power operators. - More refined is to use a Pratt parser or similar ([1](https://en.wikipedia.org/wiki/Operator-precedence_parser), [2](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html), [3](https://github.com/tonyg/racket-something/blob/f6116bf3861b76970f5ce291a628476adef820b4/src/something/pratt.rkt)) to build a parse tree using an extensible specification of the pre-, in-, and postfix operators involved. - Finally, if you treat sequences of `Expr`s as pre-lexed token streams, almost any parsing formalism (such as [PEG parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar), [Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to extract further syntactic structure. ## Appendix: Equations for interpreting P-expressions as Preserves The function **uncomma**(*p*) removes all occurrences of `,` from a P-expression *p* ∈ `Expr` − {`,`}. {:.pseudocode.equations} | **uncomma** : **Expr** − {`,`} | ⟶ | **Expr** | | | **uncomma**(`[`*p* ...`]`) | = | `[`**uncomma**(*p*) ...`]` | omitting any *p* = `,` | | **uncomma**(`<`*p* ...`>`) | = | `<`**uncomma**(*p*) ...`>` | omitting any *p* = `,` | | **uncomma**(`{`*p* ...`}`) | = | `{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` | | **uncomma**(`(`*p* ...`)`) | = | `(`**uncomma**(*p*) ...`)` | omitting any *p* = `,` | | **uncomma**(`#{`*p* ...`}`) | = | `#{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` | | **uncomma**(`#:`*p*) | = | `#:`**uncomma**(*p*) | | | **uncomma**(`@`*p* *q*) | = | `@`**uncomma**(*p*) **uncomma**(*q*) | | | **uncomma**(*p*) | = | *p* | if *p* ∈ **Atom** ∪ **Punct** − {`,`} | We write ⌞**uncomma**(*p*)⌟ for the partial function mapping a P-expression *p* ∈ `Expr` − {`,`} to a corresponding Preserves `Value`. {:.pseudocode.equations} | ⌞·⌟ : **Expr** − {`,`} | ⇀ | **Value** | | | ⌞`[`*p* ...`]`⌟ | = | `[`⌞*p*⌟ ...`]` | | | ⌞`<`ℓ *p* ...`>`⌟ | = | `<`⌞ℓ⌟ ⌞*p*⌟ ...`>` | | | ⌞`{`*k*`:`*v* ...`}`⌟ | = | `{`⌞*k*⌟`:`⌞*v*⌟ ...`}` | if all ⌞*k*⌟ ... are distinct | | ⌞`#{`*p* ...`}`⌟ | = | `#{`⌞*p*⌟ ...`}` | if all ⌞*p*⌟ ... are distinct | | ⌞`#:`*p*⌟ | = | `#:`⌞*p*⌟ | | | ⌞`@`*p* *q*⌟ | = | `@`⌞*p*⌟ ⌞*q*⌟ | | | ⌞*p*⌟ | = | *p* | when *p* ∈ **Atom** | ## Notes