preserves/preserves-expressions.md

316 lines
12 KiB
Markdown
Raw Permalink Normal View History

2023-10-31 16:37:09 +00:00
---
title: "P-expressions"
---
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
February 2024. Version 0.3.1.
2023-11-01 12:06:15 +00:00
[text syntax]: preserves-text.html
2023-10-31 16:37:09 +00:00
2023-11-01 14:15:24 +00:00
**Experimental.**
2023-10-31 16:37:09 +00:00
This document defines a grammar called *Preserves Expressions*
(*P-expressions*, *pexprs*) that includes [ordinary Preserves text
2023-11-01 12:06:15 +00:00
syntax][text syntax] but offers extensions sufficient to support a Lisp-
or Haskell-like programming notation.
2023-10-31 16:37:09 +00:00
2023-11-01 12:06:15 +00:00
**Motivation.** The [text syntax][] for Preserves works well for writing
`Value`s, i.e. data. However, in some contexts, Preserves applications
need a broader grammar that allows interleaving of *expressions* with
data. Two examples are the [Preserves Schema
2023-10-31 16:37:09 +00:00
language](preserves-schema.html) and the [Synit configuration scripting
language](https://synit.org/book/operation/scripting.html), both of
which (ab)use Preserves text syntax as a kind of programming notation.
## Preliminaries
2023-11-01 12:06:15 +00:00
The P-expression grammar includes by reference the definition of `Atom` from the
[text syntax][], as well as the definitions that `Atom` depends on.
2023-10-31 16:37:09 +00:00
2023-11-01 12:06:15 +00:00
<a id="whitespace">
**Whitespace.** Whitespace `ws` is, as in the text syntax, defined as
any number of spaces, tabs, carriage returns, or line feeds.
2023-10-31 16:37:09 +00:00
ws = *(%x20 / %x09 / CR / LF)
2023-10-31 16:37:09 +00:00
2023-11-01 12:06:15 +00:00
No changes to [the Preserves semantic model](preserves.html) are made.
Every Preserves text-syntax term can be parsed as a valid P-expression,
but in general P-expressions must be rewritten or otherwise interpreted
before a meaningful Preserves value can be arrived at ([see
below](#reading-preserves)).
2023-10-31 16:37:09 +00:00
## Grammar
2023-11-01 12:06:15 +00:00
Standalone documents containing P-expressions are sequences of
2023-11-01 14:35:59 +00:00
individual `Expr`s, followed by annotations, comments, and/or whitespace.
2023-10-31 16:37:09 +00:00
2023-11-01 14:35:59 +00:00
Document = *Expr Trailer ws
2023-10-31 16:37:09 +00:00
2023-11-01 12:06:15 +00:00
A single P-expression `Expr` can be an `Atom` from the [text syntax][],
a compound expression, special punctuation, an `Embedded` expression, or
an `Annotated` expression. The class `SimpleExpr` includes all of `Expr`
except special punctuation.
2023-11-01 10:20:33 +00:00
2023-11-01 14:15:24 +00:00
Expr = ws (SimpleExpr / Punct)
SimpleExpr = Atom / Compound / Embedded / Annotated
2023-11-01 10:20:33 +00:00
2023-11-01 12:06:15 +00:00
Embedded and annotated values are as in the text syntax, differing only
in that uses of `Value` are replaced with `SimpleExpr`.
2023-10-31 16:37:09 +00:00
Embedded = "#:" SimpleExpr
Annotated = Annotation SimpleExpr
Annotation = "@" SimpleExpr / "#" [(%x20 / %x09) linecomment] (CR / LF)
2023-11-01 14:15:24 +00:00
linecomment = *<any unicode scalar value except CR or LF>
2023-10-31 16:37:09 +00:00
2023-11-05 19:43:23 +00:00
P-expression special punctuation marks are comma, semicolon, and
sequences of one or more colons.[^greedy-colons]
2023-10-31 16:37:09 +00:00
2023-11-01 12:06:15 +00:00
Punct = "," / ";" / 1*":"
2023-10-31 16:37:09 +00:00
2023-11-05 19:43:23 +00:00
[^greedy-colons]: Colon matching is greedy: when reading, all adjacent
colons are always taken into a single token, and when writing,
adjacent colon-sequence punctuation marks must be written with
whitespace separating them.
2023-11-01 12:06:15 +00:00
Compound expressions are sequences of `Expr`s with optional trailing
`Annotation`s, surrounded by various kinds of parentheses.
2023-10-31 18:32:06 +00:00
2023-11-01 12:06:15 +00:00
Compound = Sequence / Record / Block / Group / Set
Sequence = "[" *Expr Trailer ws "]"
Record = "<" *Expr Trailer ws ">"
Block = "{" *Expr Trailer ws "}"
Group = "(" *Expr Trailer ws ")"
Set = "#{" *Expr Trailer ws "}"
2023-10-31 18:32:06 +00:00
2023-11-01 12:06:15 +00:00
In an `Annotated` P-expression, annotations and comments attach to the
term following them, just as in the ordinary text syntax. However, it is
common in programming notations to allow comments at the end of a file
or other sequential construct. The ordinary text syntax forbids comments
in these positions, but P-expressions allow them.
2023-10-31 18:32:06 +00:00
2023-11-01 14:35:59 +00:00
Trailer = *(ws Annotation)
2023-10-31 18:32:06 +00:00
2023-10-31 17:06:05 +00:00
## <a id="encoding-pexprs"></a>Encoding P-expressions as Preserves
2023-11-01 12:27:26 +00:00
We write ⌜*p*⌝ for the encoding into Preserves of a P-expression *p*.
2023-10-31 17:06:05 +00:00
{:.pseudocode.equations}
2023-11-01 12:06:15 +00:00
| ⌜·⌝ : **Expr** | ⟶ | **Value** |
| ⌜`[`*p* ...`]`⌝ | = | `[`⌜*p*⌝ ...`]` |
| ⌜`<`*p* ...`>`⌝ | = | `<r` ⌜*p*⌝ ...`>` |
| ⌜`{`*p* ...`}`⌝ | = | `<b` ⌜*p*⌝ ...`>` |
| ⌜`(`*p* ...`)`⌝ | = | `<g` ⌜*p*⌝ ...`>` |
| ⌜`#{`*p* ...`}`⌝ | = | `<s `⌜*p*⌝ ...`>` |
| ⌜`#:`*p*⌝ | = | `#:`⌜*p*⌝ |
2023-11-01 12:06:15 +00:00
| ⌜`@`*p* *q*⌝ | = | `@`⌜*p*⌝ ⌜*q*⌝ |
| ⌜*p*⌝ | = | *p* | when *p***Atom** |
| ⌜`,`⌝ | = | `<p |,|>` |
| ⌜`;`⌝ | = | `<p |;|>` |
| ⌜`:` ...⌝ | = | `<p |:` ...`|>` |
| ⌜*t*⌝ | = | `@`⌜*a*⌝ ... `<a>` | where `@`*a* ... are the annotations in *t* and *t***Trailer** |
2023-10-31 18:32:06 +00:00
2023-11-01 10:57:23 +00:00
The record `<a>` acts as an anchor for the annotations in a `Trailer`.
2023-10-31 18:32:06 +00:00
We overload the ⌜·⌝ notation for encoding whole `Document`s into
sequences of Preserves values.
{:.pseudocode.equations}
| ⌜·⌝ : **P-expression Document** | ⟶ | **Preserves Sequence** |
| ⌜*p* ...⌝ | = | `[`⌜*p*⌝ ...`]` |
| ⌜*p* ... `@`*a* ...⌝ | = | `[`⌜*p*⌝ ... `@`⌜*a*⌝ ... `<a>]` |
| | | where `@`*a* ... are trailing annotations |
2023-10-31 17:06:05 +00:00
## <a id="reading-preserves"></a>Interpreting P-expressions as Preserves
The [previous section](#encoding-pexprs) discussed ways of representing
P-expressions using Preserves. Here, we discuss *interpreting*
2023-11-01 12:27:26 +00:00
P-expressions *as* Preserves so that (1) a Preserves datum (2) written
using Preserves text syntax and then (3) read as a P-expression can be
(4) interpreted from that P-expression to yield the original datum.
2023-10-31 17:06:05 +00:00
2023-11-01 12:06:15 +00:00
1. Every `(`..`)` or `;` that appears is an error.
2. Every `:`, `::`, `:::`, ... is an error, except in context of `Block`s as described below.
3. Every `,` that appears is discarded.
2023-11-01 10:20:33 +00:00
4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error]
5. Every `Record` with no values in it is an error.
2023-11-01 12:06:15 +00:00
6. Every `Block` must contain zero or more repeating triplets of
`SimpleExpr`, `:`, `SimpleExpr`. Any `Block` not following this
pattern is an error. Each `Block` following the pattern is
translated to a `Dictionary` containing a key/value pair for each
triplet. Any `Block` with duplicate keys (under interpretation) is
an error.
7. Every `Set` containing any duplicate expressions (under
interpretation) is an error.
2023-11-01 12:27:26 +00:00
2023-10-31 18:32:06 +00:00
[^discard-trailers-instead-of-error]: **Implementation note.** When
implementing parsing of P-expressions into Preserves, consider
offering an optional mode where trailing annotations `Trailer` are
*discarded* instead of causing an error to be signalled.
2023-10-31 16:37:09 +00:00
## Appendix: Examples
Examples are given as pairs of P-expressions and their Preserves
text-syntax encodings.
2023-11-01 12:06:15 +00:00
### Individual P-expression `Expr`s
2023-10-31 18:32:06 +00:00
2023-10-31 16:37:09 +00:00
```preserves
<date 1821 (lookup-month "February") 3>
2023-11-01 10:57:23 +00:00
= <r date 1821 <g lookup-month "February"> 3>
2023-10-31 16:37:09 +00:00
```
2023-11-01 10:20:33 +00:00
```preserves
<>⌝
2023-11-01 10:57:23 +00:00
= <r>
2023-11-01 10:20:33 +00:00
```
2023-10-31 16:37:09 +00:00
```preserves
⌜(begin (println! (+ 1 2)) (+ 3 4))⌝
2023-11-01 10:57:23 +00:00
= <g begin <g println! <g + 1 2>> <g + 3 4>>
2023-10-31 16:37:09 +00:00
```
```preserves
⌜()⌝
2023-11-01 10:57:23 +00:00
= <g>
2023-10-31 16:37:09 +00:00
⌜[() () ()]⌝
2023-11-01 10:57:23 +00:00
= [<g>, <g>, <g>]
2023-10-31 16:37:09 +00:00
```
```preserves
⌜{
setUp();
# Now enter the loop
loop: {
greet("World");
}
tearDown();
}⌝
2023-11-01 10:57:23 +00:00
= <b
2023-11-01 12:06:15 +00:00
setUp <g> <p |;|>
2023-10-31 16:37:09 +00:00
# Now enter the loop
2023-11-01 12:06:15 +00:00
loop <p |:|> <b
greet <g "World"> <p |;|>
2023-11-01 10:57:23 +00:00
>
2023-11-01 12:06:15 +00:00
tearDown <g> <p |;|>
2023-11-01 10:57:23 +00:00
>
2023-10-31 16:37:09 +00:00
```
```preserves
⌜[1 + 2.0, print "Hello", predicate: #t, foo, #:remote, bar]⌝
2023-11-01 12:06:15 +00:00
= [1 + 2.0 <p |,|> print "Hello" <p |,|> predicate <p |:|> #t <p |,|>
foo <p |,|> #:remote <p |,|> bar]
2023-11-01 12:06:15 +00:00
```
```preserves
⌜#{1 2 3}⌝
= <s 1 2 3>
⌜#{(read) (read) (read)}⌝
= <s <g read> <g read> <g read>>
2023-10-31 16:37:09 +00:00
```
```preserves
⌜{
optional name: string,
address: Address,
}⌝
2023-11-01 10:57:23 +00:00
= <b
2023-11-01 12:06:15 +00:00
optional name <p |:|> string <p |,|>
address <p |:|> Address <p |,|>
2023-11-01 10:57:23 +00:00
>
2023-10-31 16:37:09 +00:00
```
2023-10-31 18:32:06 +00:00
### Whole `Document`s
```preserves
⌜{
key: value
# example of a comment at the end of a dictionary
}
# example of a comment at the end of the input file⌝
2023-11-01 10:57:23 +00:00
= [ <b
2023-11-01 12:06:15 +00:00
key <p |:|> value
2023-11-01 10:57:23 +00:00
@"example of a comment at the end of a dictionary" <a>
>
2023-10-31 18:32:06 +00:00
@"example of a comment at the end of the input file"
2023-11-01 10:57:23 +00:00
<a>
2023-10-31 18:32:06 +00:00
]
```
2023-10-31 16:37:09 +00:00
## Appendix: Reading vs. Parsing
Lisp systems first *read* streams of bytes into S-expressions and then
*parse* those S-expressions into more abstract structures denoting
various kinds of program syntax. [Separation of reading from parsing is
what gives Lisp its syntactic
flexibility.](http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-point/)
Similarly, the Apple programming language
[Dylan](https://en.wikipedia.org/wiki/Dylan_(programming_language))
included a reader-parser split, with the Dylan reader producing
*D-expressions* that are somewhat similar to P-expressions.
Finally, the Racket dialects
[Honu](https://docs.racket-lang.org/honu/index.html) and
[Something](https://github.com/tonyg/racket-something) use a
reader-parser-macro setup, where the reader produces Racket data, the
parser produces "syntax" and is user-extensible, and Racket's own
modular macro system rewrites this "syntax" down to core forms to be
compiled to machine code.
Similarly, when using P-expressions as the foundation for a language, a
generic P-expression reader can then feed into special-purpose
*parsers*. The reader captures the coarse syntactic structure of a
program, and the parser refines this.
Often, a parser will wish to extract structure from sequences of
2023-11-01 12:06:15 +00:00
P-expression `Expr`s.
2023-10-31 16:37:09 +00:00
- A simple technique is repeated splitting of sequences; first by
`Semicolon`, then by `Comma`, then by increasingly high binding-power
operators.
- More refined is to use a Pratt parser or similar
([1](https://en.wikipedia.org/wiki/Operator-precedence_parser),
[2](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html),
[3](https://github.com/tonyg/racket-something/blob/f6116bf3861b76970f5ce291a628476adef820b4/src/something/pratt.rkt))
to build a parse tree using an extensible specification of the pre-,
in-, and postfix operators involved.
2023-11-01 12:06:15 +00:00
- Finally, if you treat sequences of `Expr`s as pre-lexed token
2023-10-31 16:37:09 +00:00
streams, almost any parsing formalism (such as [PEG
parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),
[Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to
extract further syntactic structure.
2023-11-01 12:27:26 +00:00
## <a id="reading-preserves-equationally"></a>Appendix: Equations for interpreting P-expressions as Preserves
2023-11-01 12:06:15 +00:00
The function **uncomma**(*p*) removes all occurrences of `,` from a
P-expression *p*`Expr` {`,`}.
2023-11-01 12:06:15 +00:00
2023-11-01 12:27:26 +00:00
{:.pseudocode.equations}
| **uncomma** : **Expr** {`,`} | ⟶ | **Expr** | |
| **uncomma**(`[`*p* ...`]`) | = | `[`**uncomma**(*p*) ...`]` | omitting any *p* = `,` |
| **uncomma**(`<`*p* ...`>`) | = | `<`**uncomma**(*p*) ...`>` | omitting any *p* = `,` |
| **uncomma**(`{`*p* ...`}`) | = | `{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
| **uncomma**(`(`*p* ...`)`) | = | `(`**uncomma**(*p*) ...`)` | omitting any *p* = `,` |
| **uncomma**(`#{`*p* ...`}`) | = | `#{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
| **uncomma**(`#:`*p*) | = | `#:`**uncomma**(*p*) | |
| **uncomma**(`@`*p* *q*) | = | `@`**uncomma**(*p*) **uncomma**(*q*) | |
| **uncomma**(*p*) | = | *p* | if *p***Atom** **Punct** {`,`} |
2023-11-01 12:27:26 +00:00
2023-11-01 12:06:15 +00:00
We write ⌞**uncomma**(*p*)⌟ for the partial function mapping a
P-expression *p*`Expr` {`,`} to a corresponding Preserves `Value`.
2023-11-01 12:06:15 +00:00
{:.pseudocode.equations}
| ⌞·⌟ : **Expr** {`,`} | ⇀ | **Value** | |
| ⌞`[`*p* ...`]`⌟ | = | `[`⌞*p*⌟ ...`]` | |
| ⌞`<` *p* ...`>`⌟ | = | `<`⌞ℓ⌟ ⌞*p*⌟ ...`>` | |
| ⌞`{`*k*`:`*v* ...`}`⌟ | = | `{`⌞*k*⌟`:`⌞*v*⌟ ...`}` | if all ⌞*k*⌟ ... are distinct |
| ⌞`#{`*p* ...`}`⌟ | = | `#{`⌞*p*⌟ ...`}` | if all ⌞*p*⌟ ... are distinct |
| ⌞`#:`*p*⌟ | = | `#:`⌞*p*⌟ | |
| ⌞`@`*p* *q*⌟ | = | `@`⌞*p*⌟ ⌞*q*⌟ | |
| ⌞*p*⌟ | = | *p* | when *p***Atom** |
2023-11-01 12:06:15 +00:00
2023-10-31 16:37:09 +00:00
## Notes