preserves/preserves-expressions.md

232 lines
8.0 KiB
Markdown
Raw Normal View History

2023-10-31 16:37:09 +00:00
---
title: "P-expressions"
---
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
October 2023. Version 0.1.0.
This document defines a grammar called *Preserves Expressions*
(*P-expressions*, *pexprs*) that includes [ordinary Preserves text
syntax](preserves-text.html) but offers extensions sufficient to support
a Lisp- or Haskell-like programming notation.
**Motivation.** The [text syntax](preserves-text.html) for Preserves
works well for writing `Value`s, i.e. data. However, in some contexts,
Preserves applications need a broader grammar that allows interleaving
of *expressions* with data. Two examples are the [Preserves Schema
language](preserves-schema.html) and the [Synit configuration scripting
language](https://synit.org/book/operation/scripting.html), both of
which (ab)use Preserves text syntax as a kind of programming notation.
## Preliminaries
The P-expression grammar takes the text syntax grammar as its base and
modifies it.
<a id="whitespace">
**Whitespace.** Whitespace is redefined as any number of spaces, tabs,
carriage returns, or line feeds. Commas are *not* considered whitespace
in P-expressions.
ws = *(%x20 / %x09 / CR / LF)
<a id="delimiters"></a>
**Delimiters.** Because commas are no longer included in class `ws`,
class `delimiter` is widened to include them explicitly.
delimiter = ws / ","
/ "<" / ">" / "[" / "]" / "{" / "}"
/ "#" / ":" / DQUOTE / "|" / "@" / ";"
## Grammar
P-expressions add comma, semicolon, and sequences of one or more colons
to the syntax class `Value`.
Value =/ Comma / Semicolon / Colons
Comma = ","
Semicolon = ";"
Colons = 1*":"
Now that colon is in `Value`, the syntax for `Dictionary` is replaced
with `Block` everywhere it is mentioned.
Block = "{" *Value ws "}"
New syntax for explicit uninterpreted grouping of sequences of values is
introduced, and added to class `Value`.
Value =/ ws Group
Group = "(" *Value ws ")"
Finally, class `Document` is replaced in order to allow standalone
documents to directly comprise a sequence of multiple values.
Document = *Value ws
No changes to [the Preserves semantic model](preserves.html) are made.
Every Preserves text-syntax term is a valid P-expression, but in general
P-expressions must be rewritten or otherwise interpreted before a
2023-10-31 17:06:05 +00:00
meaningful Preserves value can be arrived at ([see
below](#reading-preserves)).
2023-10-31 16:37:09 +00:00
2023-10-31 17:06:05 +00:00
## <a id="encoding-pexprs"></a>Encoding P-expressions as Preserves
We write ⌜*p*⌝ for the encoding into Preserves of P-expression *p*.
{:.pseudocode.equations}
| ⌜·⌝ | : | **P-expression****Preserves** |
2023-10-31 16:37:09 +00:00
Aside from the special classes `Group`, `Block`, `Comma`, `Semicolon` or
2023-10-31 17:06:05 +00:00
`Colons`, P-expressions are encoded directly as Preserves data.
{:.pseudocode.equations}
| ⌜`[`*p* ...`]`⌝ | = | `[`⌜*p*⌝ ...`]` |
| ⌜`<`*p* ...`>`⌝ | = | `<`⌜*p*⌝ ...`>` |
| ⌜`#{`*p* ...`}`⌝ | = | `#{`⌜*p*⌝ ...`}` |
| ⌜`#!`*p*⌝ | = | `#!`⌜*p*⌝ |
| ⌜`@`*p* *q*⌝ | = | `@`⌜*p*⌝ ⌜*q*⌝ |
| ⌜*p*⌝ | = | *p* **when** *p***Atom** |
All members of the special classes are encoded as Preserves text
`Dictionary`[^encoding-rationale] values.
2023-10-31 16:37:09 +00:00
[^encoding-rationale]: In principle, it would be nice to use *records*
for this purpose, but if we did so we would have to also encode
usages of records!
2023-10-31 17:06:05 +00:00
{:.pseudocode.equations}
| ⌜`(`*p* ...`)`⌝ | = | `{g:[`⌜*p*⌝ ...`]}` |
| ⌜`{`*p* ...`}`⌝ | = | `{b:[`⌜*p*⌝ ...`]}` |
| ⌜`,`⌝ | = | `{s:|,|}` |
| ⌜`;`⌝ | = | `{s:|;|}` |
| ⌜`:` ...⌝ | = | `{s:|:` ...`|}` |
## <a id="reading-preserves"></a>Interpreting P-expressions as Preserves
The [previous section](#encoding-pexprs) discussed ways of representing
P-expressions using Preserves. Here, we discuss *interpreting*
P-expressions *as* Preserves, so that (1) a Preserves datum (2) written
using Preserves text syntax and then (3) read as a P-expression can be
(4) interpreted from that P-expression to yield the original datum.
A reader for P-expressions can be adapted to yield a reader for
Preserves terms by processing (subterms of) each P-expression that the
reader produces. The only subterms that need processing are the special
classes mentioned above.
1. Every `Group` or `Semicolon` that appears is an error.
2. Every `Colons` with two or more colons in it is an error.
3. Every `Comma` that appears is removed from its container.
4. Every `Block` must contain triplets of `Value`, `Colons` (with a
single colon), `Value`. Any `Block` not following this pattern is an
error. Each `Block` following the pattern is translated to a
`Dictionary` containing a key/value pair for each triplet.
2023-10-31 16:37:09 +00:00
## Appendix: Examples
Examples are given as pairs of P-expressions and their Preserves
text-syntax encodings.
```preserves
<date 1821 (lookup-month "February") 3>
= <date 1821 {g:[lookup-month "February"]} 3>
```
```preserves
⌜(begin (println! (+ 1 2)) (+ 3 4))⌝
= {g:[begin {g:[println! {g:[+ 1 2]}]} {g:[+ 3 4]}]}
```
```preserves
⌜()⌝
= {g:[]}
⌜[() () ()]⌝
= [{g:[]}, {g:[]}, {g:[]}]
```
```preserves
⌜{
setUp();
# Now enter the loop
loop: {
greet("World");
}
tearDown();
}⌝
= {b:[
setUp {g:[]} {s:|;|}
# Now enter the loop
loop {s:|:|} {b:[
greet {g:["World"]} {s:|;|}
]}
tearDown {g:[]} {s:|;|}
]}
```
```preserves
⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝
= [1 + 2.0 {s:|,|} print "Hello" {s:|,|} predicate {s:|:|} #t {s:|,|}
foo {s:|,|} #!remote {s:|,|} bar]
```
```preserves
⌜{
optional name: string,
address: Address,
}⌝
= {b:[
optional name {s:|:|} string {s:|,|}
address {s:|:|} Address {s:|,|}
]}
```
## Appendix: Reading vs. Parsing
Lisp systems first *read* streams of bytes into S-expressions and then
*parse* those S-expressions into more abstract structures denoting
various kinds of program syntax. [Separation of reading from parsing is
what gives Lisp its syntactic
flexibility.](http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-point/)
Similarly, the Apple programming language
[Dylan](https://en.wikipedia.org/wiki/Dylan_(programming_language))
included a reader-parser split, with the Dylan reader producing
*D-expressions* that are somewhat similar to P-expressions.
Finally, the Racket dialects
[Honu](https://docs.racket-lang.org/honu/index.html) and
[Something](https://github.com/tonyg/racket-something) use a
reader-parser-macro setup, where the reader produces Racket data, the
parser produces "syntax" and is user-extensible, and Racket's own
modular macro system rewrites this "syntax" down to core forms to be
compiled to machine code.
Similarly, when using P-expressions as the foundation for a language, a
generic P-expression reader can then feed into special-purpose
*parsers*. The reader captures the coarse syntactic structure of a
program, and the parser refines this.
Often, a parser will wish to extract structure from sequences of
P-expression `Value`s.
- A simple technique is repeated splitting of sequences; first by
`Semicolon`, then by `Comma`, then by increasingly high binding-power
operators.
- More refined is to use a Pratt parser or similar
([1](https://en.wikipedia.org/wiki/Operator-precedence_parser),
[2](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html),
[3](https://github.com/tonyg/racket-something/blob/f6116bf3861b76970f5ce291a628476adef820b4/src/something/pratt.rkt))
to build a parse tree using an extensible specification of the pre-,
in-, and postfix operators involved.
- Finally, if you treat sequences of `Value`s as pre-lexed token
streams, almost any parsing formalism (such as [PEG
parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),
[Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to
extract further syntactic structure.
## Notes