preserves/preserves-expressions.md

---
title: "P-expressions"
---

Tony Garnock-Jones <tonyg@leastfixedpoint.com>  
October 2023. Version 0.1.0.

This document defines a grammar called *Preserves Expressions*
(*P-expressions*, *pexprs*) that includes [ordinary Preserves text
syntax](preserves-text.html) but offers extensions sufficient to support
a Lisp- or Haskell-like programming notation.

**Motivation.** The [text syntax](preserves-text.html) for Preserves
works well for writing `Value`s, i.e. data. However, in some contexts,
Preserves applications need a broader grammar that allows interleaving
of *expressions* with data. Two examples are the [Preserves Schema
language](preserves-schema.html) and the [Synit configuration scripting
language](https://synit.org/book/operation/scripting.html), both of
which (ab)use Preserves text syntax as a kind of programming notation.

## Preliminaries

The P-expression grammar takes the text syntax grammar as its base and
modifies it.

<a id="whitespace">
**Whitespace.** Whitespace is redefined as any number of spaces, tabs,
carriage returns, or line feeds. Commas are *not* considered whitespace
in P-expressions.

                ws = *(%x20 / %x09 / CR / LF)

<a id="delimiters"></a>
**Delimiters.** Because commas are no longer included in class `ws`,
class `delimiter` is widened to include them explicitly.

         delimiter = ws / ","
                   / "<" / ">" / "[" / "]" / "{" / "}"
                   / "#" / ":" / DQUOTE / "|" / "@" / ";"

## Grammar

P-expressions add comma, semicolon, and sequences of one or more colons
to the syntax class `Value`.

            Value =/ Comma / Semicolon / Colons
             Comma = ","
         Semicolon = ";"
            Colons = 1*":"

Now that colon is in `Value`, the syntax for `Dictionary` is replaced
with `Block` everywhere it is mentioned.

             Block =  "{" *Value ws "}"

New syntax for explicit uninterpreted grouping of sequences of values is
introduced, and added to class `Value`.

            Value =/ ws Group
             Group = "(" *Value ws ")"

Finally, class `Document` is replaced in order to allow standalone
documents to directly comprise a sequence of multiple values.

          Document = *Value ws

No changes to [the Preserves semantic model](preserves.html) are made.
Every Preserves text-syntax term is a valid P-expression, but in general
P-expressions must be rewritten or otherwise interpreted before a
meaningful Preserves value can be arrived at ([see
below](#reading-preserves)).

## <a id="encoding-pexprs"></a>Encoding P-expressions as Preserves

We write ⌜*p*⌝ for the encoding into Preserves of P-expression *p*.

{:.pseudocode.equations}
| ⌜·⌝ | : | **P-expression** ⟶ **Preserves** |

Aside from the special classes `Group`, `Block`, `Comma`, `Semicolon` or
`Colons`, P-expressions are encoded directly as Preserves data.

{:.pseudocode.equations}
| ⌜`[`*p* ...`]`⌝  | = | `[`⌜*p*⌝ ...`]`             |
| ⌜`<`*p* ...`>`⌝  | = | `<`⌜*p*⌝ ...`>`             |
| ⌜`#{`*p* ...`}`⌝ | = | `#{`⌜*p*⌝ ...`}`            |
| ⌜`#!`*p*⌝        | = | `#!`⌜*p*⌝                   |
| ⌜`@`*p* *q*⌝     | = | `@`⌜*p*⌝ ⌜*q*⌝              |
| ⌜*p*⌝            | = | *p* **when** *p* ∈ **Atom** |

All members of the special classes are encoded as Preserves text
`Dictionary`[^encoding-rationale] values.

[^encoding-rationale]: In principle, it would be nice to use *records*
    for this purpose, but if we did so we would have to also encode
    usages of records!

{:.pseudocode.equations}
| ⌜`(`*p* ...`)`⌝ | = | `{g:[`⌜*p*⌝ ...`]}` |
| ⌜`{`*p* ...`}`⌝ | = | `{b:[`⌜*p*⌝ ...`]}` |
| ⌜`,`⌝           | = | `{s:|,|}`           |
| ⌜`;`⌝           | = | `{s:|;|}`           |
| ⌜`:` ...⌝       | = | `{s:|:` ...`|}`     |

## <a id="reading-preserves"></a>Interpreting P-expressions as Preserves

The [previous section](#encoding-pexprs) discussed ways of representing
P-expressions using Preserves. Here, we discuss *interpreting*
P-expressions *as* Preserves, so that (1) a Preserves datum (2) written
using Preserves text syntax and then (3) read as a P-expression can be
(4) interpreted from that P-expression to yield the original datum.

A reader for P-expressions can be adapted to yield a reader for
Preserves terms by processing (subterms of) each P-expression that the
reader produces. The only subterms that need processing are the special
classes mentioned above.

 1. Every `Group` or `Semicolon` that appears is an error.
 2. Every `Colons` with two or more colons in it is an error.
 3. Every `Comma` that appears is removed from its container.
 4. Every `Block` must contain triplets of `Value`, `Colons` (with a
    single colon), `Value`. Any `Block` not following this pattern is an
    error. Each `Block` following the pattern is translated to a
    `Dictionary` containing a key/value pair for each triplet.

## Appendix: Examples

Examples are given as pairs of P-expressions and their Preserves
text-syntax encodings.

```preserves
 ⌜<date 1821 (lookup-month "February") 3>⌝
= <date 1821 {g:[lookup-month "February"]} 3>
```

```preserves
 ⌜(begin (println! (+ 1 2)) (+ 3 4))⌝
= {g:[begin {g:[println! {g:[+ 1 2]}]} {g:[+ 3 4]}]}
```

```preserves
 ⌜()⌝
= {g:[]}

 ⌜[() () ()]⌝
= [{g:[]}, {g:[]}, {g:[]}]
```

```preserves
 ⌜{
      setUp();
      # Now enter the loop
      loop: {
          greet("World");
      }
      tearDown();
  }⌝
= {b:[
      setUp {g:[]} {s:|;|}
      # Now enter the loop
      loop {s:|:|} {b:[
          greet {g:["World"]} {s:|;|}
      ]}
      tearDown {g:[]} {s:|;|}
  ]}
```

```preserves
 ⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝
= [1 + 2.0 {s:|,|} print "Hello" {s:|,|} predicate {s:|:|} #t {s:|,|}
   foo {s:|,|} #!remote {s:|,|} bar]
```

```preserves
 ⌜{
      optional name: string,
      address: Address,
  }⌝
= {b:[
      optional name {s:|:|} string {s:|,|}
      address {s:|:|} Address {s:|,|}
  ]}
```

## Appendix: Reading vs. Parsing

Lisp systems first *read* streams of bytes into S-expressions and then
*parse* those S-expressions into more abstract structures denoting
various kinds of program syntax. [Separation of reading from parsing is
what gives Lisp its syntactic
flexibility.](http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-point/)

Similarly, the Apple programming language
[Dylan](https://en.wikipedia.org/wiki/Dylan_(programming_language))
included a reader-parser split, with the Dylan reader producing
*D-expressions* that are somewhat similar to P-expressions.

Finally, the Racket dialects
[Honu](https://docs.racket-lang.org/honu/index.html) and
[Something](https://github.com/tonyg/racket-something) use a
reader-parser-macro setup, where the reader produces Racket data, the
parser produces "syntax" and is user-extensible, and Racket's own
modular macro system rewrites this "syntax" down to core forms to be
compiled to machine code.

Similarly, when using P-expressions as the foundation for a language, a
generic P-expression reader can then feed into special-purpose
*parsers*. The reader captures the coarse syntactic structure of a
program, and the parser refines this.

Often, a parser will wish to extract structure from sequences of
P-expression `Value`s.

 - A simple technique is repeated splitting of sequences; first by
   `Semicolon`, then by `Comma`, then by increasingly high binding-power
   operators.

 - More refined is to use a Pratt parser or similar
   ([1](https://en.wikipedia.org/wiki/Operator-precedence_parser),
   [2](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html),
   [3](https://github.com/tonyg/racket-something/blob/f6116bf3861b76970f5ce291a628476adef820b4/src/something/pratt.rkt))
   to build a parse tree using an extensible specification of the pre-,
   in-, and postfix operators involved.

 - Finally, if you treat sequences of `Value`s as pre-lexed token
   streams, almost any parsing formalism (such as [PEG
   parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),
   [Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to
   extract further syntactic structure.

## Notes
preserves-expressions.md 2023-10-31 16:37:09 +00:00			`---`
			`title: "P-expressions"`
			`---`

			`Tony Garnock-Jones <tonyg@leastfixedpoint.com>`
			`October 2023. Version 0.1.0.`

			`This document defines a grammar called Preserves Expressions`
			`(P-expressions, pexprs) that includes [ordinary Preserves text`
			`syntax](preserves-text.html) but offers extensions sufficient to support`
			`a Lisp- or Haskell-like programming notation.`

			`Motivation. The [text syntax](preserves-text.html) for Preserves`
			works well for writing `Value`s, i.e. data. However, in some contexts,
			`Preserves applications need a broader grammar that allows interleaving`
			`of expressions with data. Two examples are the [Preserves Schema`
			`language](preserves-schema.html) and the [Synit configuration scripting`
			`language](https://synit.org/book/operation/scripting.html), both of`
			`which (ab)use Preserves text syntax as a kind of programming notation.`

			`## Preliminaries`

			`The P-expression grammar takes the text syntax grammar as its base and`
			`modifies it.`

			`<a id="whitespace">`
			`Whitespace. Whitespace is redefined as any number of spaces, tabs,`
			`carriage returns, or line feeds. Commas are not considered whitespace`
			`in P-expressions.`

			`ws = *(%x20 / %x09 / CR / LF)`

			`<a id="delimiters"></a>`
			Delimiters. Because commas are no longer included in class `ws`,
			class `delimiter` is widened to include them explicitly.

			`delimiter = ws / ","`
			`/ "<" / ">" / "[" / "]" / "{" / "}"`
			`/ "#" / ":" / DQUOTE / "\|" / "@" / ";"`

			`## Grammar`

			`P-expressions add comma, semicolon, and sequences of one or more colons`
			to the syntax class `Value`.

			`Value =/ Comma / Semicolon / Colons`
			`Comma = ","`
			`Semicolon = ";"`
			`Colons = 1*":"`

			Now that colon is in `Value`, the syntax for `Dictionary` is replaced
			with `Block` everywhere it is mentioned.

			`Block = "{" *Value ws "}"`

			`New syntax for explicit uninterpreted grouping of sequences of values is`
			introduced, and added to class `Value`.

			`Value =/ ws Group`
			`Group = "(" *Value ws ")"`

			Finally, class `Document` is replaced in order to allow standalone
			`documents to directly comprise a sequence of multiple values.`

			`Document = *Value ws`

			`No changes to [the Preserves semantic model](preserves.html) are made.`
			`Every Preserves text-syntax term is a valid P-expression, but in general`
			`P-expressions must be rewritten or otherwise interpreted before a`
Tweaks 2023-10-31 17:06:05 +00:00			`meaningful Preserves value can be arrived at ([see`
			`below](#reading-preserves)).`
preserves-expressions.md 2023-10-31 16:37:09 +00:00
Tweaks 2023-10-31 17:06:05 +00:00			`## <a id="encoding-pexprs"></a>Encoding P-expressions as Preserves`

			`We write ⌜p⌝ for the encoding into Preserves of P-expression p.`

			`{:.pseudocode.equations}`
			`\| ⌜·⌝ \| : \| P-expression ⟶ Preserves \|`
preserves-expressions.md 2023-10-31 16:37:09 +00:00
			Aside from the special classes `Group`, `Block`, `Comma`, `Semicolon` or
Tweaks 2023-10-31 17:06:05 +00:00			`Colons`, P-expressions are encoded directly as Preserves data.

			`{:.pseudocode.equations}`
			\| ⌜`[`p ...`]`⌝ \| = \| `[`⌜p⌝ ...`]` \|
			\| ⌜`<`p ...`>`⌝ \| = \| `<`⌜p⌝ ...`>` \|
			\| ⌜`#{`p ...`}`⌝ \| = \| `#{`⌜p⌝ ...`}` \|
			\| ⌜`#!`p⌝ \| = \| `#!`⌜p⌝ \|
			\| ⌜`@`p q⌝ \| = \| `@`⌜p⌝ ⌜q⌝ \|
			`\| ⌜p⌝ \| = \| p when p ∈ Atom \|`

			`All members of the special classes are encoded as Preserves text`
			`Dictionary`[^encoding-rationale] values.
preserves-expressions.md 2023-10-31 16:37:09 +00:00
			`[^encoding-rationale]: In principle, it would be nice to use records`
			`for this purpose, but if we did so we would have to also encode`
			`usages of records!`

Tweaks 2023-10-31 17:06:05 +00:00			`{:.pseudocode.equations}`
			\| ⌜`(`p ...`)`⌝ \| = \| `{g:[`⌜p⌝ ...`]}` \|
			\| ⌜`{`p ...`}`⌝ \| = \| `{b:[`⌜p⌝ ...`]}` \|
			\| ⌜`,`⌝ \| = \| `{s:\|,\|}` \|
			\| ⌜`;`⌝ \| = \| `{s:\|;\|}` \|
			\| ⌜`:` ...⌝ \| = \| `{s:\|:` ...`\|}` \|

			`## <a id="reading-preserves"></a>Interpreting P-expressions as Preserves`

			`The [previous section](#encoding-pexprs) discussed ways of representing`
			`P-expressions using Preserves. Here, we discuss interpreting`
			`P-expressions as Preserves, so that (1) a Preserves datum (2) written`
			`using Preserves text syntax and then (3) read as a P-expression can be`
			`(4) interpreted from that P-expression to yield the original datum.`

			`A reader for P-expressions can be adapted to yield a reader for`
			`Preserves terms by processing (subterms of) each P-expression that the`
			`reader produces. The only subterms that need processing are the special`
			`classes mentioned above.`

			1. Every `Group` or `Semicolon` that appears is an error.
			2. Every `Colons` with two or more colons in it is an error.
			3. Every `Comma` that appears is removed from its container.
			4. Every `Block` must contain triplets of `Value`, `Colons` (with a
			single colon), `Value`. Any `Block` not following this pattern is an
			error. Each `Block` following the pattern is translated to a
			`Dictionary` containing a key/value pair for each triplet.
preserves-expressions.md 2023-10-31 16:37:09 +00:00
			`## Appendix: Examples`

			`Examples are given as pairs of P-expressions and their Preserves`
			`text-syntax encodings.`

			```preserves
			`⌜<date 1821 (lookup-month "February") 3>⌝`
			`= <date 1821 {g:[lookup-month "February"]} 3>`
			```

			```preserves
			`⌜(begin (println! (+ 1 2)) (+ 3 4))⌝`
			`= {g:[begin {g:[println! {g:[+ 1 2]}]} {g:[+ 3 4]}]}`
			```

			```preserves
			`⌜()⌝`
			`= {g:[]}`

			`⌜[() () ()]⌝`
			`= [{g:[]}, {g:[]}, {g:[]}]`
			```

			```preserves
			`⌜{`
			`setUp();`
			`# Now enter the loop`
			`loop: {`
			`greet("World");`
			`}`
			`tearDown();`
			`}⌝`
			`= {b:[`
			`setUp {g:[]} {s:\|;\|}`
			`# Now enter the loop`
			`loop {s:\|:\|} {b:[`
			`greet {g:["World"]} {s:\|;\|}`
			`]}`
			`tearDown {g:[]} {s:\|;\|}`
			`]}`
			```

			```preserves
			`⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝`
			`= [1 + 2.0 {s:\|,\|} print "Hello" {s:\|,\|} predicate {s:\|:\|} #t {s:\|,\|}`
			`foo {s:\|,\|} #!remote {s:\|,\|} bar]`
			```

			```preserves
			`⌜{`
			`optional name: string,`
			`address: Address,`
			`}⌝`
			`= {b:[`
			`optional name {s:\|:\|} string {s:\|,\|}`
			`address {s:\|:\|} Address {s:\|,\|}`
			`]}`
			```

			`## Appendix: Reading vs. Parsing`

			`Lisp systems first read streams of bytes into S-expressions and then`
			`parse those S-expressions into more abstract structures denoting`
			`various kinds of program syntax. [Separation of reading from parsing is`
			`what gives Lisp its syntactic`
			`flexibility.](http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-point/)`

			`Similarly, the Apple programming language`
			`[Dylan](https://en.wikipedia.org/wiki/Dylan_(programming_language))`
			`included a reader-parser split, with the Dylan reader producing`
			`D-expressions that are somewhat similar to P-expressions.`

			`Finally, the Racket dialects`
			`[Honu](https://docs.racket-lang.org/honu/index.html) and`
			`[Something](https://github.com/tonyg/racket-something) use a`
			`reader-parser-macro setup, where the reader produces Racket data, the`
			`parser produces "syntax" and is user-extensible, and Racket's own`
			`modular macro system rewrites this "syntax" down to core forms to be`
			`compiled to machine code.`

			`Similarly, when using P-expressions as the foundation for a language, a`
			`generic P-expression reader can then feed into special-purpose`
			`parsers. The reader captures the coarse syntactic structure of a`
			`program, and the parser refines this.`

			`Often, a parser will wish to extract structure from sequences of`
			P-expression `Value`s.

			`- A simple technique is repeated splitting of sequences; first by`
			`Semicolon`, then by `Comma`, then by increasingly high binding-power
			`operators.`

			`- More refined is to use a Pratt parser or similar`
			`([1](https://en.wikipedia.org/wiki/Operator-precedence_parser),`
			`[2](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html),`
			`[3](https://github.com/tonyg/racket-something/blob/f6116bf3861b76970f5ce291a628476adef820b4/src/something/pratt.rkt))`
			`to build a parse tree using an extensible specification of the pre-,`
			`in-, and postfix operators involved.`

			- Finally, if you treat sequences of `Value`s as pre-lexed token
			`streams, almost any parsing formalism (such as [PEG`
			`parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),`
			`[Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to`
			`extract further syntactic structure.`

			`## Notes`