preserves/preserves-expressions.md

---
title: "P-expressions"
---

Tony Garnock-Jones <tonyg@leastfixedpoint.com>
February 2024. Version 0.3.1.

[text syntax]: preserves-text.html

**Experimental.**
This document defines a grammar called *Preserves Expressions*
(*P-expressions*, *pexprs*) that includes [ordinary Preserves text
syntax][text syntax] but offers extensions sufficient to support a Lisp-
or Haskell-like programming notation.

**Motivation.** The [text syntax][] for Preserves works well for writing
`Value`s, i.e. data. However, in some contexts, Preserves applications
need a broader grammar that allows interleaving of *expressions* with
data. Two examples are the [Preserves Schema
language](preserves-schema.html) and the [Synit configuration scripting
language](https://synit.org/book/operation/scripting.html), both of
which (ab)use Preserves text syntax as a kind of programming notation.

## Preliminaries

The P-expression grammar includes by reference the definition of `Atom` from the
[text syntax][], as well as the definitions that `Atom` depends on.

<a id="whitespace">
**Whitespace.** Whitespace `ws` is, as in the text syntax, defined as
any number of spaces, tabs, carriage returns, or line feeds.

                ws = *(%x20 / %x09 / CR / LF)

No changes to [the Preserves semantic model](preserves.html) are made.
Every Preserves text-syntax term can be parsed as a valid P-expression,
but in general P-expressions must be rewritten or otherwise interpreted
before a meaningful Preserves value can be arrived at ([see
below](#reading-preserves)).

## Grammar

Standalone documents containing P-expressions are sequences of
individual `Expr`s, followed by annotations, comments, and/or whitespace.

          Document = *Expr Trailer ws

A single P-expression `Expr` can be an `Atom` from the [text syntax][],
a compound expression, special punctuation, an `Embedded` expression, or
an `Annotated` expression. The class `SimpleExpr` includes all of `Expr`
except special punctuation.

              Expr = ws (SimpleExpr / Punct)
        SimpleExpr = Atom / Compound / Embedded / Annotated

Embedded and annotated values are as in the text syntax, differing only
in that uses of `Value` are replaced with `SimpleExpr`.

           Embedded = "#:" SimpleExpr
          Annotated = Annotation SimpleExpr
         Annotation = "@" SimpleExpr / "#" [(%x20 / %x09) linecomment] (CR / LF)
        linecomment = *<any unicode scalar value except CR or LF>

P-expression special punctuation marks are comma, semicolon, and
sequences of one or more colons.[^greedy-colons]

             Punct = "," / ";" / 1*":"

[^greedy-colons]: Colon matching is greedy: when reading, all adjacent
    colons are always taken into a single token, and when writing,
    adjacent colon-sequence punctuation marks must be written with
    whitespace separating them.

Compound expressions are sequences of `Expr`s with optional trailing
`Annotation`s, surrounded by various kinds of parentheses.

          Compound = Sequence / Record / Block / Group / Set
          Sequence =  "[" *Expr Trailer ws "]"
            Record =  "<" *Expr Trailer ws ">"
             Block =  "{" *Expr Trailer ws "}"
             Group =  "(" *Expr Trailer ws ")"
               Set = "#{" *Expr Trailer ws "}"

In an `Annotated` P-expression, annotations and comments attach to the
term following them, just as in the ordinary text syntax. However, it is
common in programming notations to allow comments at the end of a file
or other sequential construct. The ordinary text syntax forbids comments
in these positions, but P-expressions allow them.

           Trailer = *(ws Annotation)

## <a id="encoding-pexprs"></a>Encoding P-expressions as Preserves

We write ⌜*p*⌝ for the encoding into Preserves of a P-expression *p*.

{:.pseudocode.equations}
| ⌜·⌝ : **Expr**   | ⟶ | **Value** |
| ⌜`[`*p* ...`]`⌝  | = | `[`⌜*p*⌝ ...`]`         |
| ⌜`<`*p* ...`>`⌝  | = | `<r` ⌜*p*⌝ ...`>`       |
| ⌜`{`*p* ...`}`⌝  | = | `<b` ⌜*p*⌝ ...`>`       |
| ⌜`(`*p* ...`)`⌝  | = | `<g` ⌜*p*⌝ ...`>`       |
| ⌜`#{`*p* ...`}`⌝ | = | `<s `⌜*p*⌝ ...`>`       |
| ⌜`#:`*p*⌝        | = | `#:`⌜*p*⌝               |
| ⌜`@`*p* *q*⌝     | = | `@`⌜*p*⌝ ⌜*q*⌝          |
| ⌜*p*⌝            | = | *p* | when *p* ∈ **Atom** |
| ⌜`,`⌝            | = | `<p |,|>`               |
| ⌜`;`⌝            | = | `<p |;|>`               |
| ⌜`:` ...⌝        | = | `<p |:` ...`|>`         |
| ⌜*t*⌝            | = | `@`⌜*a*⌝ ... `<a>` | where `@`*a* ... are the annotations in *t* and *t* ∈ **Trailer** |

The record `<a>` acts as an anchor for the annotations in a `Trailer`.

We overload the ⌜·⌝ notation for encoding whole `Document`s into
sequences of Preserves values.

{:.pseudocode.equations}
| ⌜·⌝ : **P-expression Document** | ⟶ | **Preserves Sequence**                    |
| ⌜*p* ...⌝                       | = | `[`⌜*p*⌝ ...`]`                           |
| ⌜*p* ... `@`*a* ...⌝            | = | `[`⌜*p*⌝ ... `@`⌜*a*⌝ ... `<a>]`          |
|                                 |   | where `@`*a* ... are trailing annotations |

## <a id="reading-preserves"></a>Interpreting P-expressions as Preserves

The [previous section](#encoding-pexprs) discussed ways of representing
P-expressions using Preserves. Here, we discuss *interpreting*
P-expressions *as* Preserves so that (1) a Preserves datum (2) written
using Preserves text syntax and then (3) read as a P-expression can be
(4) interpreted from that P-expression to yield the original datum.

 1. Every `(`..`)` or `;` that appears is an error.
 2. Every `:`, `::`, `:::`, ... is an error, except in context of `Block`s as described below.
 3. Every `,` that appears is discarded.
 4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error]
 5. Every `Record` with no values in it is an error.
 6. Every `Block` must contain zero or more repeating triplets of
    `SimpleExpr`, `:`, `SimpleExpr`. Any `Block` not following this
    pattern is an error. Each `Block` following the pattern is
    translated to a `Dictionary` containing a key/value pair for each
    triplet. Any `Block` with duplicate keys (under interpretation) is
    an error.
 7. Every `Set` containing any duplicate expressions (under
    interpretation) is an error.

[^discard-trailers-instead-of-error]: **Implementation note.** When
    implementing parsing of P-expressions into Preserves, consider
    offering an optional mode where trailing annotations `Trailer` are
    *discarded* instead of causing an error to be signalled.

## Appendix: Examples

Examples are given as pairs of P-expressions and their Preserves
text-syntax encodings.

### Individual P-expression `Expr`s

```preserves
 ⌜<date 1821 (lookup-month "February") 3>⌝
= <r date 1821 <g lookup-month "February"> 3>
```

```preserves
 ⌜<>⌝
= <r>
```

```preserves
 ⌜(begin (println! (+ 1 2)) (+ 3 4))⌝
= <g begin <g println! <g + 1 2>> <g + 3 4>>
```

```preserves
 ⌜()⌝
= <g>

 ⌜[() () ()]⌝
= [<g>, <g>, <g>]
```

```preserves
 ⌜{
      setUp();
      # Now enter the loop
      loop: {
          greet("World");
      }
      tearDown();
  }⌝
= <b
      setUp <g> <p |;|>
      # Now enter the loop
      loop <p |:|> <b
          greet <g "World"> <p |;|>
      >
      tearDown <g> <p |;|>
  >
```

```preserves
 ⌜[1 + 2.0, print "Hello", predicate: #t, foo, #:remote, bar]⌝
= [1 + 2.0 <p |,|> print "Hello" <p |,|> predicate <p |:|> #t <p |,|>
   foo <p |,|> #:remote <p |,|> bar]
```

```preserves
 ⌜#{1 2 3}⌝
= <s 1 2 3>

 ⌜#{(read) (read) (read)}⌝
= <s <g read> <g read> <g read>>
```

```preserves
 ⌜{
      optional name: string,
      address: Address,
  }⌝
= <b
      optional name <p |:|> string <p |,|>
      address <p |:|> Address <p |,|>
  >
```

### Whole `Document`s

```preserves
 ⌜{
      key: value
      # example of a comment at the end of a dictionary
  }
  # example of a comment at the end of the input file⌝
= [ <b
        key <p |:|> value
        @"example of a comment at the end of a dictionary" <a>
    >
    @"example of a comment at the end of the input file"
    <a>
  ]
```

## Appendix: Reading vs. Parsing

Lisp systems first *read* streams of bytes into S-expressions and then
*parse* those S-expressions into more abstract structures denoting
various kinds of program syntax. [Separation of reading from parsing is
what gives Lisp its syntactic
flexibility.](http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-point/)

Similarly, the Apple programming language
[Dylan](https://en.wikipedia.org/wiki/Dylan_(programming_language))
included a reader-parser split, with the Dylan reader producing
*D-expressions* that are somewhat similar to P-expressions.

Finally, the Racket dialects
[Honu](https://docs.racket-lang.org/honu/index.html) and
[Something](https://github.com/tonyg/racket-something) use a
reader-parser-macro setup, where the reader produces Racket data, the
parser produces "syntax" and is user-extensible, and Racket's own
modular macro system rewrites this "syntax" down to core forms to be
compiled to machine code.

Similarly, when using P-expressions as the foundation for a language, a
generic P-expression reader can then feed into special-purpose
*parsers*. The reader captures the coarse syntactic structure of a
program, and the parser refines this.

Often, a parser will wish to extract structure from sequences of
P-expression `Expr`s.

 - A simple technique is repeated splitting of sequences; first by
   `Semicolon`, then by `Comma`, then by increasingly high binding-power
   operators.

 - More refined is to use a Pratt parser or similar
   ([1](https://en.wikipedia.org/wiki/Operator-precedence_parser),
   [2](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html),
   [3](https://github.com/tonyg/racket-something/blob/f6116bf3861b76970f5ce291a628476adef820b4/src/something/pratt.rkt))
   to build a parse tree using an extensible specification of the pre-,
   in-, and postfix operators involved.

 - Finally, if you treat sequences of `Expr`s as pre-lexed token
   streams, almost any parsing formalism (such as [PEG
   parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),
   [Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to
   extract further syntactic structure.

## <a id="reading-preserves-equationally"></a>Appendix: Equations for interpreting P-expressions as Preserves

The function **uncomma**(*p*) removes all occurrences of `,` from a
P-expression *p* ∈ `Expr` − {`,`}.

{:.pseudocode.equations}
| **uncomma** : **Expr** − {`,`} | ⟶ | **Expr**                             |                                       |
| **uncomma**(`[`*p* ...`]`)     | = | `[`**uncomma**(*p*) ...`]`           | omitting any *p* = `,`                |
| **uncomma**(`<`*p* ...`>`)     | = | `<`**uncomma**(*p*) ...`>`           | omitting any *p* = `,`                |
| **uncomma**(`{`*p* ...`}`)     | = | `{`**uncomma**(*p*) ...`}`           | omitting any *p* = `,`                |
| **uncomma**(`(`*p* ...`)`)     | = | `(`**uncomma**(*p*) ...`)`           | omitting any *p* = `,`                |
| **uncomma**(`#{`*p* ...`}`)    | = | `#{`**uncomma**(*p*) ...`}`          | omitting any *p* = `,`                |
| **uncomma**(`#:`*p*)           | = | `#:`**uncomma**(*p*)                 |                                       |
| **uncomma**(`@`*p* *q*)        | = | `@`**uncomma**(*p*) **uncomma**(*q*) |                                       |
| **uncomma**(*p*)               | = | *p*                                  | if *p* ∈ **Atom** ∪ **Punct** − {`,`} |

We write ⌞**uncomma**(*p*)⌟ for the partial function mapping a
P-expression *p* ∈ `Expr` − {`,`} to a corresponding Preserves `Value`.

{:.pseudocode.equations}
| ⌞·⌟ : **Expr** − {`,`} | ⇀ | **Value**               |                               |
| ⌞`[`*p* ...`]`⌟        | = | `[`⌞*p*⌟ ...`]`         |                               |
| ⌞`<`ℓ *p* ...`>`⌟      | = | `<`⌞ℓ⌟ ⌞*p*⌟ ...`>`     |                               |
| ⌞`{`*k*`:`*v* ...`}`⌟  | = | `{`⌞*k*⌟`:`⌞*v*⌟ ...`}` | if all ⌞*k*⌟ ... are distinct |
| ⌞`#{`*p* ...`}`⌟       | = | `#{`⌞*p*⌟ ...`}`        | if all ⌞*p*⌟ ... are distinct |
| ⌞`#:`*p*⌟              | = | `#:`⌞*p*⌟               |                               |
| ⌞`@`*p* *q*⌟           | = | `@`⌞*p*⌟ ⌞*q*⌟          |                               |
| ⌞*p*⌟                  | = | *p*                     | when *p* ∈ **Atom**           |

## Notes