preserves/preserves-expressions.md

303 lines
10 KiB
Markdown
Raw Normal View History

2023-10-31 16:37:09 +00:00
---
title: "P-expressions"
---
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
2023-11-01 10:20:33 +00:00
October 2023. Version 0.1.2.
2023-10-31 16:37:09 +00:00
This document defines a grammar called *Preserves Expressions*
(*P-expressions*, *pexprs*) that includes [ordinary Preserves text
syntax](preserves-text.html) but offers extensions sufficient to support
a Lisp- or Haskell-like programming notation.
**Motivation.** The [text syntax](preserves-text.html) for Preserves
works well for writing `Value`s, i.e. data. However, in some contexts,
Preserves applications need a broader grammar that allows interleaving
of *expressions* with data. Two examples are the [Preserves Schema
language](preserves-schema.html) and the [Synit configuration scripting
language](https://synit.org/book/operation/scripting.html), both of
which (ab)use Preserves text syntax as a kind of programming notation.
## Preliminaries
The P-expression grammar takes the text syntax grammar as its base and
modifies it.
<a id="whitespace">
**Whitespace.** Whitespace is redefined as any number of spaces, tabs,
carriage returns, or line feeds. Commas are *not* considered whitespace
in P-expressions.
ws = *(%x20 / %x09 / CR / LF)
<a id="delimiters"></a>
**Delimiters.** Because commas are no longer included in class `ws`,
class `delimiter` is widened to include them explicitly.
delimiter = ws / ","
/ "<" / ">" / "[" / "]" / "{" / "}"
/ "#" / ":" / DQUOTE / "|" / "@" / ";"
## Grammar
P-expressions add comma, semicolon, and sequences of one or more colons
to the syntax class `Value`.
Value =/ Comma / Semicolon / Colons
Comma = ","
Semicolon = ";"
Colons = 1*":"
Now that colon is in `Value`, the syntax for `Dictionary` is replaced
with `Block` everywhere it is mentioned.
2023-10-31 18:32:06 +00:00
Block = "{" *Value ws "}"
2023-10-31 16:37:09 +00:00
2023-11-01 10:20:33 +00:00
Syntax for `Record` is loosened to allow empty angle brackets.
Record = "<" *Value ws ">"
2023-10-31 16:37:09 +00:00
New syntax for explicit uninterpreted grouping of sequences of values is
introduced, and added to class `Value`.
Value =/ ws Group
Group = "(" *Value ws ")"
Finally, class `Document` is replaced in order to allow standalone
documents to directly comprise a sequence of multiple values.
Document = *Value ws
No changes to [the Preserves semantic model](preserves.html) are made.
Every Preserves text-syntax term is a valid P-expression, but in general
P-expressions must be rewritten or otherwise interpreted before a
2023-10-31 17:06:05 +00:00
meaningful Preserves value can be arrived at ([see
below](#reading-preserves)).
2023-10-31 16:37:09 +00:00
2023-10-31 18:32:06 +00:00
## <a id="annotations"></a>Annotations and Comments
Annotations and comments attach to the term following them, just as in
the ordinary text syntax. However, it is common in programming notations
to allow comments at the end of a file or other sequential construct:
{
key: value
# example of a comment at the end of a dictionary
}
# example of a comment at the end of the input file
While the ordinary text syntax forbids comments in these positions,
P-expressions allow them:
Document =/ *Value Trailer ws
2023-11-01 10:20:33 +00:00
Record =/ "<" *Value Trailer ws ">"
2023-10-31 18:32:06 +00:00
Sequence =/ "[" *Value Trailer ws "]"
Set =/ "#{" *Value Trailer ws "}"
Block =/ "{" *Value Trailer ws "}"
Trailer = 1*Annotation
2023-10-31 17:06:05 +00:00
## <a id="encoding-pexprs"></a>Encoding P-expressions as Preserves
We write ⌜*p*⌝ for the encoding into Preserves of P-expression *p*.
{:.pseudocode.equations}
2023-10-31 18:32:06 +00:00
| ⌜·⌝ : **P-expression** | ⟶ | **Preserves** |
2023-10-31 16:37:09 +00:00
2023-10-31 18:32:06 +00:00
Aside from the special classes `Group`, `Block`, `Comma`, `Semicolon`,
2023-11-01 10:20:33 +00:00
`Colons`, `Trailer`, or empty `Record`, P-expressions are encoded
directly as Preserves data.
2023-10-31 17:06:05 +00:00
{:.pseudocode.equations}
| ⌜`[`*p* ...`]`⌝ | = | `[`⌜*p*⌝ ...`]` |
| ⌜`<`*p* ...`>`⌝ | = | `<`⌜*p*⌝ ...`>` |
| ⌜`#{`*p* ...`}`⌝ | = | `#{`⌜*p*⌝ ...`}` |
| ⌜`#!`*p*⌝ | = | `#!`⌜*p*⌝ |
| ⌜`@`*p* *q*⌝ | = | `@`⌜*p*⌝ ⌜*q*⌝ |
2023-10-31 18:32:06 +00:00
| ⌜*p*⌝ | = | *p* when *p***Atom** |
2023-10-31 17:06:05 +00:00
2023-11-01 10:20:33 +00:00
Everything else is encoded as Preserves
2023-10-31 18:32:06 +00:00
dictionaries[^encoding-rationale].
2023-10-31 16:37:09 +00:00
[^encoding-rationale]: In principle, it would be nice to use *records*
for this purpose, but if we did so we would have to also encode
usages of records!
2023-10-31 17:06:05 +00:00
{:.pseudocode.equations}
2023-11-01 10:20:33 +00:00
| ⌜`<>`⌝ | = | `{r:[]}` |
2023-10-31 17:06:05 +00:00
| ⌜`(`*p* ...`)`⌝ | = | `{g:[`⌜*p*⌝ ...`]}` |
| ⌜`{`*p* ...`}`⌝ | = | `{b:[`⌜*p*⌝ ...`]}` |
| ⌜`,`⌝ | = | `{s:|,|}` |
| ⌜`;`⌝ | = | `{s:|;|}` |
| ⌜`:` ...⌝ | = | `{s:|:` ...`|}` |
2023-10-31 18:32:06 +00:00
| ⌜*t*⌝ | = | ⌜*a*⌝ ... `{}`, where *a* ... are the annotations in *t* and *t***Trailer** |
The empty dictionary `{}` acts as an anchor for the annotations in a
`Trailer`.
We overload the ⌜·⌝ notation for encoding whole `Document`s into
sequences of Preserves values.
{:.pseudocode.equations}
| ⌜·⌝ : **P-expression Document** | ⟶ | **Preserves Sequence** |
| ⌜*p* ...⌝ | = | `[`⌜*p*⌝ ...`]` |
2023-10-31 17:06:05 +00:00
## <a id="reading-preserves"></a>Interpreting P-expressions as Preserves
The [previous section](#encoding-pexprs) discussed ways of representing
P-expressions using Preserves. Here, we discuss *interpreting*
P-expressions *as* Preserves, so that (1) a Preserves datum (2) written
using Preserves text syntax and then (3) read as a P-expression can be
(4) interpreted from that P-expression to yield the original datum.
A reader for P-expressions can be adapted to yield a reader for
Preserves terms by processing (subterms of) each P-expression that the
reader produces. The only subterms that need processing are the special
classes mentioned above.
1. Every `Group` or `Semicolon` that appears is an error.
2. Every `Colons` with two or more colons in it is an error.
2023-10-31 18:32:06 +00:00
3. Every `Comma` that appears is discarded.
2023-11-01 10:20:33 +00:00
4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error]
5. Every `Record` with no values in it is an error.
6. Every `Block` must contain triplets of `Value`, `Colons` (with a
2023-10-31 17:06:05 +00:00
single colon), `Value`. Any `Block` not following this pattern is an
error. Each `Block` following the pattern is translated to a
`Dictionary` containing a key/value pair for each triplet.
2023-10-31 16:37:09 +00:00
2023-10-31 18:32:06 +00:00
[^discard-trailers-instead-of-error]: **Implementation note.** When
implementing parsing of P-expressions into Preserves, consider
offering an optional mode where trailing annotations `Trailer` are
*discarded* instead of causing an error to be signalled.
2023-10-31 16:37:09 +00:00
## Appendix: Examples
Examples are given as pairs of P-expressions and their Preserves
text-syntax encodings.
2023-10-31 18:32:06 +00:00
### Individual P-expression `Value`s
2023-10-31 16:37:09 +00:00
```preserves
<date 1821 (lookup-month "February") 3>
= <date 1821 {g:[lookup-month "February"]} 3>
```
2023-11-01 10:20:33 +00:00
```preserves
<>⌝
= {r:[]}
```
2023-10-31 16:37:09 +00:00
```preserves
⌜(begin (println! (+ 1 2)) (+ 3 4))⌝
= {g:[begin {g:[println! {g:[+ 1 2]}]} {g:[+ 3 4]}]}
```
```preserves
⌜()⌝
= {g:[]}
⌜[() () ()]⌝
= [{g:[]}, {g:[]}, {g:[]}]
```
```preserves
⌜{
setUp();
# Now enter the loop
loop: {
greet("World");
}
tearDown();
}⌝
= {b:[
setUp {g:[]} {s:|;|}
# Now enter the loop
loop {s:|:|} {b:[
greet {g:["World"]} {s:|;|}
]}
tearDown {g:[]} {s:|;|}
]}
```
```preserves
⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝
= [1 + 2.0 {s:|,|} print "Hello" {s:|,|} predicate {s:|:|} #t {s:|,|}
foo {s:|,|} #!remote {s:|,|} bar]
```
```preserves
⌜{
optional name: string,
address: Address,
}⌝
= {b:[
optional name {s:|:|} string {s:|,|}
address {s:|:|} Address {s:|,|}
]}
```
2023-10-31 18:32:06 +00:00
### Whole `Document`s
```preserves
⌜{
key: value
# example of a comment at the end of a dictionary
}
# example of a comment at the end of the input file⌝
= [ {b:[
key {s:|:|} value
@"example of a comment at the end of a dictionary" {}
]}
@"example of a comment at the end of the input file"
{}
]
```
2023-10-31 16:37:09 +00:00
## Appendix: Reading vs. Parsing
Lisp systems first *read* streams of bytes into S-expressions and then
*parse* those S-expressions into more abstract structures denoting
various kinds of program syntax. [Separation of reading from parsing is
what gives Lisp its syntactic
flexibility.](http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-point/)
Similarly, the Apple programming language
[Dylan](https://en.wikipedia.org/wiki/Dylan_(programming_language))
included a reader-parser split, with the Dylan reader producing
*D-expressions* that are somewhat similar to P-expressions.
Finally, the Racket dialects
[Honu](https://docs.racket-lang.org/honu/index.html) and
[Something](https://github.com/tonyg/racket-something) use a
reader-parser-macro setup, where the reader produces Racket data, the
parser produces "syntax" and is user-extensible, and Racket's own
modular macro system rewrites this "syntax" down to core forms to be
compiled to machine code.
Similarly, when using P-expressions as the foundation for a language, a
generic P-expression reader can then feed into special-purpose
*parsers*. The reader captures the coarse syntactic structure of a
program, and the parser refines this.
Often, a parser will wish to extract structure from sequences of
P-expression `Value`s.
- A simple technique is repeated splitting of sequences; first by
`Semicolon`, then by `Comma`, then by increasingly high binding-power
operators.
- More refined is to use a Pratt parser or similar
([1](https://en.wikipedia.org/wiki/Operator-precedence_parser),
[2](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html),
[3](https://github.com/tonyg/racket-something/blob/f6116bf3861b76970f5ce291a628476adef820b4/src/something/pratt.rkt))
to build a parse tree using an extensible specification of the pre-,
in-, and postfix operators involved.
- Finally, if you treat sequences of `Value`s as pre-lexed token
streams, almost any parsing formalism (such as [PEG
parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),
[Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to
extract further syntactic structure.
## Notes