preserves/preserves-expressions.md

316 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "P-expressions"
---
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
February 2024. Version 0.3.1.
[text syntax]: preserves-text.html
**Experimental.**
This document defines a grammar called *Preserves Expressions*
(*P-expressions*, *pexprs*) that includes [ordinary Preserves text
syntax][text syntax] but offers extensions sufficient to support a Lisp-
or Haskell-like programming notation.
**Motivation.** The [text syntax][] for Preserves works well for writing
`Value`s, i.e. data. However, in some contexts, Preserves applications
need a broader grammar that allows interleaving of *expressions* with
data. Two examples are the [Preserves Schema
language](preserves-schema.html) and the [Synit configuration scripting
language](https://synit.org/book/operation/scripting.html), both of
which (ab)use Preserves text syntax as a kind of programming notation.
## Preliminaries
The P-expression grammar includes by reference the definition of `Atom` from the
[text syntax][], as well as the definitions that `Atom` depends on.
<a id="whitespace">
**Whitespace.** Whitespace `ws` is, as in the text syntax, defined as
any number of spaces, tabs, carriage returns, or line feeds.
ws = *(%x20 / %x09 / CR / LF)
No changes to [the Preserves semantic model](preserves.html) are made.
Every Preserves text-syntax term can be parsed as a valid P-expression,
but in general P-expressions must be rewritten or otherwise interpreted
before a meaningful Preserves value can be arrived at ([see
below](#reading-preserves)).
## Grammar
Standalone documents containing P-expressions are sequences of
individual `Expr`s, followed by annotations, comments, and/or whitespace.
Document = *Expr Trailer ws
A single P-expression `Expr` can be an `Atom` from the [text syntax][],
a compound expression, special punctuation, an `Embedded` expression, or
an `Annotated` expression. The class `SimpleExpr` includes all of `Expr`
except special punctuation.
Expr = ws (SimpleExpr / Punct)
SimpleExpr = Atom / Compound / Embedded / Annotated
Embedded and annotated values are as in the text syntax, differing only
in that uses of `Value` are replaced with `SimpleExpr`.
Embedded = "#:" SimpleExpr
Annotated = Annotation SimpleExpr
Annotation = "@" SimpleExpr / "#" [(%x20 / %x09) linecomment] (CR / LF)
linecomment = *<any unicode scalar value except CR or LF>
P-expression special punctuation marks are comma, semicolon, and
sequences of one or more colons.[^greedy-colons]
Punct = "," / ";" / 1*":"
[^greedy-colons]: Colon matching is greedy: when reading, all adjacent
colons are always taken into a single token, and when writing,
adjacent colon-sequence punctuation marks must be written with
whitespace separating them.
Compound expressions are sequences of `Expr`s with optional trailing
`Annotation`s, surrounded by various kinds of parentheses.
Compound = Sequence / Record / Block / Group / Set
Sequence = "[" *Expr Trailer ws "]"
Record = "<" *Expr Trailer ws ">"
Block = "{" *Expr Trailer ws "}"
Group = "(" *Expr Trailer ws ")"
Set = "#{" *Expr Trailer ws "}"
In an `Annotated` P-expression, annotations and comments attach to the
term following them, just as in the ordinary text syntax. However, it is
common in programming notations to allow comments at the end of a file
or other sequential construct. The ordinary text syntax forbids comments
in these positions, but P-expressions allow them.
Trailer = *(ws Annotation)
## <a id="encoding-pexprs"></a>Encoding P-expressions as Preserves
We write ⌜*p*⌝ for the encoding into Preserves of a P-expression *p*.
{:.pseudocode.equations}
| ⌜·⌝ : **Expr** | ⟶ | **Value** |
| ⌜`[`*p* ...`]`⌝ | = | `[`⌜*p*⌝ ...`]` |
| ⌜`<`*p* ...`>`⌝ | = | `<r` ⌜*p*⌝ ...`>` |
| ⌜`{`*p* ...`}`⌝ | = | `<b` ⌜*p*⌝ ...`>` |
| ⌜`(`*p* ...`)`⌝ | = | `<g` ⌜*p*⌝ ...`>` |
| ⌜`#{`*p* ...`}`⌝ | = | `<s `⌜*p*⌝ ...`>` |
| ⌜`#:`*p*⌝ | = | `#:`⌜*p*⌝ |
| ⌜`@`*p* *q*⌝ | = | `@`⌜*p*⌝ ⌜*q*⌝ |
| ⌜*p*⌝ | = | *p* | when *p***Atom** |
| ⌜`,`⌝ | = | `<p |,|>` |
| ⌜`;`⌝ | = | `<p |;|>` |
| ⌜`:` ...⌝ | = | `<p |:` ...`|>` |
| ⌜*t*⌝ | = | `@`⌜*a*⌝ ... `<a>` | where `@`*a* ... are the annotations in *t* and *t***Trailer** |
The record `<a>` acts as an anchor for the annotations in a `Trailer`.
We overload the ⌜·⌝ notation for encoding whole `Document`s into
sequences of Preserves values.
{:.pseudocode.equations}
| ⌜·⌝ : **P-expression Document** | ⟶ | **Preserves Sequence** |
| ⌜*p* ...⌝ | = | `[`⌜*p*⌝ ...`]` |
| ⌜*p* ... `@`*a* ...⌝ | = | `[`⌜*p*⌝ ... `@`⌜*a*⌝ ... `<a>]` |
| | | where `@`*a* ... are trailing annotations |
## <a id="reading-preserves"></a>Interpreting P-expressions as Preserves
The [previous section](#encoding-pexprs) discussed ways of representing
P-expressions using Preserves. Here, we discuss *interpreting*
P-expressions *as* Preserves so that (1) a Preserves datum (2) written
using Preserves text syntax and then (3) read as a P-expression can be
(4) interpreted from that P-expression to yield the original datum.
1. Every `(`..`)` or `;` that appears is an error.
2. Every `:`, `::`, `:::`, ... is an error, except in context of `Block`s as described below.
3. Every `,` that appears is discarded.
4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error]
5. Every `Record` with no values in it is an error.
6. Every `Block` must contain zero or more repeating triplets of
`SimpleExpr`, `:`, `SimpleExpr`. Any `Block` not following this
pattern is an error. Each `Block` following the pattern is
translated to a `Dictionary` containing a key/value pair for each
triplet. Any `Block` with duplicate keys (under interpretation) is
an error.
7. Every `Set` containing any duplicate expressions (under
interpretation) is an error.
[^discard-trailers-instead-of-error]: **Implementation note.** When
implementing parsing of P-expressions into Preserves, consider
offering an optional mode where trailing annotations `Trailer` are
*discarded* instead of causing an error to be signalled.
## Appendix: Examples
Examples are given as pairs of P-expressions and their Preserves
text-syntax encodings.
### Individual P-expression `Expr`s
```preserves
⌜<date 1821 (lookup-month "February") 3>⌝
= <r date 1821 <g lookup-month "February"> 3>
```
```preserves
⌜<>⌝
= <r>
```
```preserves
⌜(begin (println! (+ 1 2)) (+ 3 4))⌝
= <g begin <g println! <g + 1 2>> <g + 3 4>>
```
```preserves
⌜()⌝
= <g>
⌜[() () ()]⌝
= [<g>, <g>, <g>]
```
```preserves
⌜{
setUp();
# Now enter the loop
loop: {
greet("World");
}
tearDown();
}⌝
= <b
setUp <g> <p |;|>
# Now enter the loop
loop <p |:|> <b
greet <g "World"> <p |;|>
>
tearDown <g> <p |;|>
>
```
```preserves
⌜[1 + 2.0, print "Hello", predicate: #t, foo, #:remote, bar]⌝
= [1 + 2.0 <p |,|> print "Hello" <p |,|> predicate <p |:|> #t <p |,|>
foo <p |,|> #:remote <p |,|> bar]
```
```preserves
⌜#{1 2 3}⌝
= <s 1 2 3>
⌜#{(read) (read) (read)}⌝
= <s <g read> <g read> <g read>>
```
```preserves
⌜{
optional name: string,
address: Address,
}⌝
= <b
optional name <p |:|> string <p |,|>
address <p |:|> Address <p |,|>
>
```
### Whole `Document`s
```preserves
⌜{
key: value
# example of a comment at the end of a dictionary
}
# example of a comment at the end of the input file⌝
= [ <b
key <p |:|> value
@"example of a comment at the end of a dictionary" <a>
>
@"example of a comment at the end of the input file"
<a>
]
```
## Appendix: Reading vs. Parsing
Lisp systems first *read* streams of bytes into S-expressions and then
*parse* those S-expressions into more abstract structures denoting
various kinds of program syntax. [Separation of reading from parsing is
what gives Lisp its syntactic
flexibility.](http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-point/)
Similarly, the Apple programming language
[Dylan](https://en.wikipedia.org/wiki/Dylan_(programming_language))
included a reader-parser split, with the Dylan reader producing
*D-expressions* that are somewhat similar to P-expressions.
Finally, the Racket dialects
[Honu](https://docs.racket-lang.org/honu/index.html) and
[Something](https://github.com/tonyg/racket-something) use a
reader-parser-macro setup, where the reader produces Racket data, the
parser produces "syntax" and is user-extensible, and Racket's own
modular macro system rewrites this "syntax" down to core forms to be
compiled to machine code.
Similarly, when using P-expressions as the foundation for a language, a
generic P-expression reader can then feed into special-purpose
*parsers*. The reader captures the coarse syntactic structure of a
program, and the parser refines this.
Often, a parser will wish to extract structure from sequences of
P-expression `Expr`s.
- A simple technique is repeated splitting of sequences; first by
`Semicolon`, then by `Comma`, then by increasingly high binding-power
operators.
- More refined is to use a Pratt parser or similar
([1](https://en.wikipedia.org/wiki/Operator-precedence_parser),
[2](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html),
[3](https://github.com/tonyg/racket-something/blob/f6116bf3861b76970f5ce291a628476adef820b4/src/something/pratt.rkt))
to build a parse tree using an extensible specification of the pre-,
in-, and postfix operators involved.
- Finally, if you treat sequences of `Expr`s as pre-lexed token
streams, almost any parsing formalism (such as [PEG
parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),
[Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to
extract further syntactic structure.
## <a id="reading-preserves-equationally"></a>Appendix: Equations for interpreting P-expressions as Preserves
The function **uncomma**(*p*) removes all occurrences of `,` from a
P-expression *p*`Expr` {`,`}.
{:.pseudocode.equations}
| **uncomma** : **Expr** {`,`} | ⟶ | **Expr** | |
| **uncomma**(`[`*p* ...`]`) | = | `[`**uncomma**(*p*) ...`]` | omitting any *p* = `,` |
| **uncomma**(`<`*p* ...`>`) | = | `<`**uncomma**(*p*) ...`>` | omitting any *p* = `,` |
| **uncomma**(`{`*p* ...`}`) | = | `{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
| **uncomma**(`(`*p* ...`)`) | = | `(`**uncomma**(*p*) ...`)` | omitting any *p* = `,` |
| **uncomma**(`#{`*p* ...`}`) | = | `#{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
| **uncomma**(`#:`*p*) | = | `#:`**uncomma**(*p*) | |
| **uncomma**(`@`*p* *q*) | = | `@`**uncomma**(*p*) **uncomma**(*q*) | |
| **uncomma**(*p*) | = | *p* | if *p***Atom** **Punct** {`,`} |
We write ⌞**uncomma**(*p*)⌟ for the partial function mapping a
P-expression *p*`Expr` {`,`} to a corresponding Preserves `Value`.
{:.pseudocode.equations}
| ⌞·⌟ : **Expr** {`,`} | ⇀ | **Value** | |
| ⌞`[`*p* ...`]`⌟ | = | `[`⌞*p*⌟ ...`]` | |
| ⌞`<` *p* ...`>`⌟ | = | `<`⌞ℓ⌟ ⌞*p*⌟ ...`>` | |
| ⌞`{`*k*`:`*v* ...`}`⌟ | = | `{`⌞*k*⌟`:`⌞*v*⌟ ...`}` | if all ⌞*k*⌟ ... are distinct |
| ⌞`#{`*p* ...`}`⌟ | = | `#{`⌞*p*⌟ ...`}` | if all ⌞*p*⌟ ... are distinct |
| ⌞`#:`*p*⌟ | = | `#:`⌞*p*⌟ | |
| ⌞`@`*p* *q*⌟ | = | `@`⌞*p*⌟ ⌞*q*⌟ | |
| ⌞*p*⌟ | = | *p* | when *p***Atom** |
## Notes