--- title: "P-expressions" --- Tony Garnock-Jones October 2023. Version 0.1.0. This document defines a grammar called *Preserves Expressions* (*P-expressions*, *pexprs*) that includes [ordinary Preserves text syntax](preserves-text.html) but offers extensions sufficient to support a Lisp- or Haskell-like programming notation. **Motivation.** The [text syntax](preserves-text.html) for Preserves works well for writing `Value`s, i.e. data. However, in some contexts, Preserves applications need a broader grammar that allows interleaving of *expressions* with data. Two examples are the [Preserves Schema language](preserves-schema.html) and the [Synit configuration scripting language](https://synit.org/book/operation/scripting.html), both of which (ab)use Preserves text syntax as a kind of programming notation. ## Preliminaries The P-expression grammar takes the text syntax grammar as its base and modifies it. **Whitespace.** Whitespace is redefined as any number of spaces, tabs, carriage returns, or line feeds. Commas are *not* considered whitespace in P-expressions. ws = *(%x20 / %x09 / CR / LF) **Delimiters.** Because commas are no longer included in class `ws`, class `delimiter` is widened to include them explicitly. delimiter = ws / "," / "<" / ">" / "[" / "]" / "{" / "}" / "#" / ":" / DQUOTE / "|" / "@" / ";" ## Grammar P-expressions add comma, semicolon, and sequences of one or more colons to the syntax class `Value`. Value =/ Comma / Semicolon / Colons Comma = "," Semicolon = ";" Colons = 1*":" Now that colon is in `Value`, the syntax for `Dictionary` is replaced with `Block` everywhere it is mentioned. Block = "{" *Value ws "}" New syntax for explicit uninterpreted grouping of sequences of values is introduced, and added to class `Value`. Value =/ ws Group Group = "(" *Value ws ")" Finally, class `Document` is replaced in order to allow standalone documents to directly comprise a sequence of multiple values. Document = *Value ws No changes to [the Preserves semantic model](preserves.html) are made. Every Preserves text-syntax term is a valid P-expression, but in general P-expressions must be rewritten or otherwise interpreted before a meaningful Preserves value can be arrived at. ## Encoding P-expressions as Preserves Aside from the special classes `Group`, `Block`, `Comma`, `Semicolon` or `Colons`, P-expressions are directly encodable as Preserves data. All members of the special classes are encoded as Preserves text `Dictionary`[^encoding-rationale] values: [^encoding-rationale]: In principle, it would be nice to use *records* for this purpose, but if we did so we would have to also encode usages of records! {:.pseudocode} > ⌜`(`*p* ...`)`⌝ ⟶ `{g:[`⌜*p*⌝ ...`]}` > ⌜`{`*p* ...`}`⌝ ⟶ `{b:[`⌜*p*⌝ ...`]}` > ⌜`,`⌝ ⟶ `{s:|,|}` > ⌜`;`⌝ ⟶ `{s:|;|}` > ⌜`:` ...⌝ ⟶ `{s:|:` ...`|}` ## Appendix: Examples Examples are given as pairs of P-expressions and their Preserves text-syntax encodings. ```preserves ⌜⌝ = ``` ```preserves ⌜(begin (println! (+ 1 2)) (+ 3 4))⌝ = {g:[begin {g:[println! {g:[+ 1 2]}]} {g:[+ 3 4]}]} ``` ```preserves ⌜()⌝ = {g:[]} ⌜[() () ()]⌝ = [{g:[]}, {g:[]}, {g:[]}] ``` ```preserves ⌜{ setUp(); # Now enter the loop loop: { greet("World"); } tearDown(); }⌝ = {b:[ setUp {g:[]} {s:|;|} # Now enter the loop loop {s:|:|} {b:[ greet {g:["World"]} {s:|;|} ]} tearDown {g:[]} {s:|;|} ]} ``` ```preserves ⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝ = [1 + 2.0 {s:|,|} print "Hello" {s:|,|} predicate {s:|:|} #t {s:|,|} foo {s:|,|} #!remote {s:|,|} bar] ``` ```preserves ⌜{ optional name: string, address: Address, }⌝ = {b:[ optional name {s:|:|} string {s:|,|} address {s:|:|} Address {s:|,|} ]} ``` ## Appendix: Using a P-expression reader to read Preserves A reader for P-expressions can be adapted to yield a reader for Preserves terms by processing (subterms of) each P-expression that the reader produces. The only subterms that need processing are the special classes mentioned above. 1. Every `Group` or `Semicolon` that appears is an error. 2. Every `Colons` with two or more colons in it is an error. 3. Every `Comma` that appears is removed from its container. 4. Every `Block` must contain triplets of `Value`, `Colons` (with a single colon), `Value`. Any `Block` not following this pattern is an error. Each `Block` following the pattern is translated to a `Dictionary` containing a key/value pair for each triplet. ## Appendix: Reading vs. Parsing Lisp systems first *read* streams of bytes into S-expressions and then *parse* those S-expressions into more abstract structures denoting various kinds of program syntax. [Separation of reading from parsing is what gives Lisp its syntactic flexibility.](http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-point/) Similarly, the Apple programming language [Dylan](https://en.wikipedia.org/wiki/Dylan_(programming_language)) included a reader-parser split, with the Dylan reader producing *D-expressions* that are somewhat similar to P-expressions. Finally, the Racket dialects [Honu](https://docs.racket-lang.org/honu/index.html) and [Something](https://github.com/tonyg/racket-something) use a reader-parser-macro setup, where the reader produces Racket data, the parser produces "syntax" and is user-extensible, and Racket's own modular macro system rewrites this "syntax" down to core forms to be compiled to machine code. Similarly, when using P-expressions as the foundation for a language, a generic P-expression reader can then feed into special-purpose *parsers*. The reader captures the coarse syntactic structure of a program, and the parser refines this. Often, a parser will wish to extract structure from sequences of P-expression `Value`s. - A simple technique is repeated splitting of sequences; first by `Semicolon`, then by `Comma`, then by increasingly high binding-power operators. - More refined is to use a Pratt parser or similar ([1](https://en.wikipedia.org/wiki/Operator-precedence_parser), [2](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html), [3](https://github.com/tonyg/racket-something/blob/f6116bf3861b76970f5ce291a628476adef820b4/src/something/pratt.rkt)) to build a parse tree using an extensible specification of the pre-, in-, and postfix operators involved. - Finally, if you treat sequences of `Value`s as pre-lexed token streams, almost any parsing formalism (such as [PEG parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar), [Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to extract further syntactic structure. ## Notes