310 lines
12 KiB
Markdown
310 lines
12 KiB
Markdown
---
|
||
title: "P-expressions"
|
||
---
|
||
|
||
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
||
October 2023. Version 0.3.0.
|
||
|
||
[text syntax]: preserves-text.html
|
||
|
||
**Experimental.**
|
||
This document defines a grammar called *Preserves Expressions*
|
||
(*P-expressions*, *pexprs*) that includes [ordinary Preserves text
|
||
syntax][text syntax] but offers extensions sufficient to support a Lisp-
|
||
or Haskell-like programming notation.
|
||
|
||
**Motivation.** The [text syntax][] for Preserves works well for writing
|
||
`Value`s, i.e. data. However, in some contexts, Preserves applications
|
||
need a broader grammar that allows interleaving of *expressions* with
|
||
data. Two examples are the [Preserves Schema
|
||
language](preserves-schema.html) and the [Synit configuration scripting
|
||
language](https://synit.org/book/operation/scripting.html), both of
|
||
which (ab)use Preserves text syntax as a kind of programming notation.
|
||
|
||
## Preliminaries
|
||
|
||
The P-expression grammar includes by reference the definition of `Atom` from the
|
||
[text syntax][], as well as the definitions that `Atom` depends on.
|
||
|
||
<a id="whitespace">
|
||
**Whitespace.** Whitespace `ws` is, as in the text syntax, defined as
|
||
any number of spaces, tabs, carriage returns, or line feeds.
|
||
|
||
ws = *(%x20 / %x09 / CR / LF)
|
||
|
||
No changes to [the Preserves semantic model](preserves.html) are made.
|
||
Every Preserves text-syntax term can be parsed as a valid P-expression,
|
||
but in general P-expressions must be rewritten or otherwise interpreted
|
||
before a meaningful Preserves value can be arrived at ([see
|
||
below](#reading-preserves)).
|
||
|
||
## Grammar
|
||
|
||
Standalone documents containing P-expressions are sequences of
|
||
individual `Expr`s, followed by annotations, comments, and/or whitespace.
|
||
|
||
Document = *Expr Trailer ws
|
||
|
||
A single P-expression `Expr` can be an `Atom` from the [text syntax][],
|
||
a compound expression, special punctuation, an `Embedded` expression, or
|
||
an `Annotated` expression. The class `SimpleExpr` includes all of `Expr`
|
||
except special punctuation.
|
||
|
||
Expr = ws (SimpleExpr / Punct)
|
||
SimpleExpr = Atom / Compound / Embedded / Annotated
|
||
|
||
Embedded and annotated values are as in the text syntax, differing only
|
||
in that uses of `Value` are replaced with `SimpleExpr`.
|
||
|
||
Embedded = "#!" SimpleExpr
|
||
Annotated = Annotation SimpleExpr
|
||
Annotation = "@" SimpleExpr / "#" [(%x20 / %x09) linecomment] (CR / LF)
|
||
linecomment = *<any unicode scalar value except CR or LF>
|
||
|
||
P-expression special punctuation marks are comma, semicolon, and sequences of one or more colons.
|
||
|
||
Punct = "," / ";" / 1*":"
|
||
|
||
Compound expressions are sequences of `Expr`s with optional trailing
|
||
`Annotation`s, surrounded by various kinds of parentheses.
|
||
|
||
Compound = Sequence / Record / Block / Group / Set
|
||
Sequence = "[" *Expr Trailer ws "]"
|
||
Record = "<" *Expr Trailer ws ">"
|
||
Block = "{" *Expr Trailer ws "}"
|
||
Group = "(" *Expr Trailer ws ")"
|
||
Set = "#{" *Expr Trailer ws "}"
|
||
|
||
In an `Annotated` P-expression, annotations and comments attach to the
|
||
term following them, just as in the ordinary text syntax. However, it is
|
||
common in programming notations to allow comments at the end of a file
|
||
or other sequential construct. The ordinary text syntax forbids comments
|
||
in these positions, but P-expressions allow them.
|
||
|
||
Trailer = *(ws Annotation)
|
||
|
||
## <a id="encoding-pexprs"></a>Encoding P-expressions as Preserves
|
||
|
||
We write ⌜*p*⌝ for the encoding into Preserves of a P-expression *p*.
|
||
|
||
{:.pseudocode.equations}
|
||
| ⌜·⌝ : **Expr** | ⟶ | **Value** |
|
||
| ⌜`[`*p* ...`]`⌝ | = | `[`⌜*p*⌝ ...`]` |
|
||
| ⌜`<`*p* ...`>`⌝ | = | `<r` ⌜*p*⌝ ...`>` |
|
||
| ⌜`{`*p* ...`}`⌝ | = | `<b` ⌜*p*⌝ ...`>` |
|
||
| ⌜`(`*p* ...`)`⌝ | = | `<g` ⌜*p*⌝ ...`>` |
|
||
| ⌜`#{`*p* ...`}`⌝ | = | `<s `⌜*p*⌝ ...`>` |
|
||
| ⌜`#!`*p*⌝ | = | `#!`⌜*p*⌝ |
|
||
| ⌜`@`*p* *q*⌝ | = | `@`⌜*p*⌝ ⌜*q*⌝ |
|
||
| ⌜*p*⌝ | = | *p* | when *p* ∈ **Atom** |
|
||
| ⌜`,`⌝ | = | `<p |,|>` |
|
||
| ⌜`;`⌝ | = | `<p |;|>` |
|
||
| ⌜`:` ...⌝ | = | `<p |:` ...`|>` |
|
||
| ⌜*t*⌝ | = | `@`⌜*a*⌝ ... `<a>` | where `@`*a* ... are the annotations in *t* and *t* ∈ **Trailer** |
|
||
|
||
The record `<a>` acts as an anchor for the annotations in a `Trailer`.
|
||
|
||
We overload the ⌜·⌝ notation for encoding whole `Document`s into
|
||
sequences of Preserves values.
|
||
|
||
{:.pseudocode.equations}
|
||
| ⌜·⌝ : **P-expression Document** | ⟶ | **Preserves Sequence** |
|
||
| ⌜*p* ...⌝ | = | `[`⌜*p*⌝ ...`]` |
|
||
| ⌜*p* ... `@`*a* ...⌝ | = | `[`⌜*p*⌝ ... `@`⌜*a*⌝ ... `<a>]` |
|
||
| | | where `@`*a* ... are trailing annotations |
|
||
|
||
## <a id="reading-preserves"></a>Interpreting P-expressions as Preserves
|
||
|
||
The [previous section](#encoding-pexprs) discussed ways of representing
|
||
P-expressions using Preserves. Here, we discuss *interpreting*
|
||
P-expressions *as* Preserves so that (1) a Preserves datum (2) written
|
||
using Preserves text syntax and then (3) read as a P-expression can be
|
||
(4) interpreted from that P-expression to yield the original datum.
|
||
|
||
1. Every `(`..`)` or `;` that appears is an error.
|
||
2. Every `:`, `::`, `:::`, ... is an error, except in context of `Block`s as described below.
|
||
3. Every `,` that appears is discarded.
|
||
4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error]
|
||
5. Every `Record` with no values in it is an error.
|
||
6. Every `Block` must contain zero or more repeating triplets of
|
||
`SimpleExpr`, `:`, `SimpleExpr`. Any `Block` not following this
|
||
pattern is an error. Each `Block` following the pattern is
|
||
translated to a `Dictionary` containing a key/value pair for each
|
||
triplet. Any `Block` with duplicate keys (under interpretation) is
|
||
an error.
|
||
7. Every `Set` containing any duplicate expressions (under
|
||
interpretation) is an error.
|
||
|
||
[^discard-trailers-instead-of-error]: **Implementation note.** When
|
||
implementing parsing of P-expressions into Preserves, consider
|
||
offering an optional mode where trailing annotations `Trailer` are
|
||
*discarded* instead of causing an error to be signalled.
|
||
|
||
## Appendix: Examples
|
||
|
||
Examples are given as pairs of P-expressions and their Preserves
|
||
text-syntax encodings.
|
||
|
||
### Individual P-expression `Expr`s
|
||
|
||
```preserves
|
||
⌜<date 1821 (lookup-month "February") 3>⌝
|
||
= <r date 1821 <g lookup-month "February"> 3>
|
||
```
|
||
|
||
```preserves
|
||
⌜<>⌝
|
||
= <r>
|
||
```
|
||
|
||
```preserves
|
||
⌜(begin (println! (+ 1 2)) (+ 3 4))⌝
|
||
= <g begin <g println! <g + 1 2>> <g + 3 4>>
|
||
```
|
||
|
||
```preserves
|
||
⌜()⌝
|
||
= <g>
|
||
|
||
⌜[() () ()]⌝
|
||
= [<g>, <g>, <g>]
|
||
```
|
||
|
||
```preserves
|
||
⌜{
|
||
setUp();
|
||
# Now enter the loop
|
||
loop: {
|
||
greet("World");
|
||
}
|
||
tearDown();
|
||
}⌝
|
||
= <b
|
||
setUp <g> <p |;|>
|
||
# Now enter the loop
|
||
loop <p |:|> <b
|
||
greet <g "World"> <p |;|>
|
||
>
|
||
tearDown <g> <p |;|>
|
||
>
|
||
```
|
||
|
||
```preserves
|
||
⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝
|
||
= [1 + 2.0 <p |,|> print "Hello" <p |,|> predicate <p |:|> #t <p |,|>
|
||
foo <p |,|> #!remote <p |,|> bar]
|
||
```
|
||
|
||
```preserves
|
||
⌜#{1 2 3}⌝
|
||
= <s 1 2 3>
|
||
|
||
⌜#{(read) (read) (read)}⌝
|
||
= <s <g read> <g read> <g read>>
|
||
```
|
||
|
||
```preserves
|
||
⌜{
|
||
optional name: string,
|
||
address: Address,
|
||
}⌝
|
||
= <b
|
||
optional name <p |:|> string <p |,|>
|
||
address <p |:|> Address <p |,|>
|
||
>
|
||
```
|
||
|
||
### Whole `Document`s
|
||
|
||
```preserves
|
||
⌜{
|
||
key: value
|
||
# example of a comment at the end of a dictionary
|
||
}
|
||
# example of a comment at the end of the input file⌝
|
||
= [ <b
|
||
key <p |:|> value
|
||
@"example of a comment at the end of a dictionary" <a>
|
||
>
|
||
@"example of a comment at the end of the input file"
|
||
<a>
|
||
]
|
||
```
|
||
|
||
## Appendix: Reading vs. Parsing
|
||
|
||
Lisp systems first *read* streams of bytes into S-expressions and then
|
||
*parse* those S-expressions into more abstract structures denoting
|
||
various kinds of program syntax. [Separation of reading from parsing is
|
||
what gives Lisp its syntactic
|
||
flexibility.](http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-point/)
|
||
|
||
Similarly, the Apple programming language
|
||
[Dylan](https://en.wikipedia.org/wiki/Dylan_(programming_language))
|
||
included a reader-parser split, with the Dylan reader producing
|
||
*D-expressions* that are somewhat similar to P-expressions.
|
||
|
||
Finally, the Racket dialects
|
||
[Honu](https://docs.racket-lang.org/honu/index.html) and
|
||
[Something](https://github.com/tonyg/racket-something) use a
|
||
reader-parser-macro setup, where the reader produces Racket data, the
|
||
parser produces "syntax" and is user-extensible, and Racket's own
|
||
modular macro system rewrites this "syntax" down to core forms to be
|
||
compiled to machine code.
|
||
|
||
Similarly, when using P-expressions as the foundation for a language, a
|
||
generic P-expression reader can then feed into special-purpose
|
||
*parsers*. The reader captures the coarse syntactic structure of a
|
||
program, and the parser refines this.
|
||
|
||
Often, a parser will wish to extract structure from sequences of
|
||
P-expression `Expr`s.
|
||
|
||
- A simple technique is repeated splitting of sequences; first by
|
||
`Semicolon`, then by `Comma`, then by increasingly high binding-power
|
||
operators.
|
||
|
||
- More refined is to use a Pratt parser or similar
|
||
([1](https://en.wikipedia.org/wiki/Operator-precedence_parser),
|
||
[2](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html),
|
||
[3](https://github.com/tonyg/racket-something/blob/f6116bf3861b76970f5ce291a628476adef820b4/src/something/pratt.rkt))
|
||
to build a parse tree using an extensible specification of the pre-,
|
||
in-, and postfix operators involved.
|
||
|
||
- Finally, if you treat sequences of `Expr`s as pre-lexed token
|
||
streams, almost any parsing formalism (such as [PEG
|
||
parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),
|
||
[Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to
|
||
extract further syntactic structure.
|
||
|
||
## <a id="reading-preserves-equationally"></a>Appendix: Equations for interpreting P-expressions as Preserves
|
||
|
||
The function **uncomma**(*p*) removes all occurrences of `,` from a
|
||
P-expression *p* ∈ `Expr` − {`,`}.
|
||
|
||
{:.pseudocode.equations}
|
||
| **uncomma** : **Expr** − {`,`} | ⟶ | **Expr** | |
|
||
| **uncomma**(`[`*p* ...`]`) | = | `[`**uncomma**(*p*) ...`]` | omitting any *p* = `,` |
|
||
| **uncomma**(`<`*p* ...`>`) | = | `<`**uncomma**(*p*) ...`>` | omitting any *p* = `,` |
|
||
| **uncomma**(`{`*p* ...`}`) | = | `{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
|
||
| **uncomma**(`(`*p* ...`)`) | = | `(`**uncomma**(*p*) ...`)` | omitting any *p* = `,` |
|
||
| **uncomma**(`#{`*p* ...`}`) | = | `#{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
|
||
| **uncomma**(`#!`*p*) | = | `#!`**uncomma**(*p*) | |
|
||
| **uncomma**(`@`*p* *q*) | = | `@`**uncomma**(*p*) **uncomma**(*q*) | |
|
||
| **uncomma**(*p*) | = | *p* | if *p* ∈ **Atom** ∪ **Punct** − {`,`} |
|
||
|
||
We write ⌞**uncomma**(*p*)⌟ for the partial function mapping a
|
||
P-expression *p* ∈ `Expr` − {`,`} to a corresponding Preserves `Value`.
|
||
|
||
{:.pseudocode.equations}
|
||
| ⌞·⌟ : **Expr** − {`,`} | ⇀ | **Value** | |
|
||
| ⌞`[`*p* ...`]`⌟ | = | `[`⌞*p*⌟ ...`]` | |
|
||
| ⌞`<`ℓ *p* ...`>`⌟ | = | `<`⌞ℓ⌟ ⌞*p*⌟ ...`>` | |
|
||
| ⌞`{`*k*`:`*v* ...`}`⌟ | = | `{`⌞*k*⌟`:`⌞*v*⌟ ...`}` | if all ⌞*k*⌟ ... are distinct |
|
||
| ⌞`#{`*p* ...`}`⌟ | = | `#{`⌞*p*⌟ ...`}` | if all ⌞*p*⌟ ... are distinct |
|
||
| ⌞`#!`*p*⌟ | = | `#!`⌞*p*⌟ | |
|
||
| ⌞`@`*p* *q*⌟ | = | `@`⌞*p*⌟ ⌞*q*⌟ | |
|
||
| ⌞*p*⌟ | = | *p* | when *p* ∈ **Atom** |
|
||
|
||
## Notes
|