295 lines
9.9 KiB
Markdown
295 lines
9.9 KiB
Markdown
---
|
|
title: "P-expressions"
|
|
---
|
|
|
|
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
|
October 2023. Version 0.2.0.
|
|
|
|
This document defines a grammar called *Preserves Expressions*
|
|
(*P-expressions*, *pexprs*) that includes [ordinary Preserves text
|
|
syntax](preserves-text.html) but offers extensions sufficient to support
|
|
a Lisp- or Haskell-like programming notation.
|
|
|
|
**Motivation.** The [text syntax](preserves-text.html) for Preserves
|
|
works well for writing `Value`s, i.e. data. However, in some contexts,
|
|
Preserves applications need a broader grammar that allows interleaving
|
|
of *expressions* with data. Two examples are the [Preserves Schema
|
|
language](preserves-schema.html) and the [Synit configuration scripting
|
|
language](https://synit.org/book/operation/scripting.html), both of
|
|
which (ab)use Preserves text syntax as a kind of programming notation.
|
|
|
|
## Preliminaries
|
|
|
|
The P-expression grammar takes the text syntax grammar as its base and
|
|
modifies it.
|
|
|
|
<a id="whitespace">
|
|
**Whitespace.** Whitespace is redefined as any number of spaces, tabs,
|
|
carriage returns, or line feeds. Commas are *not* considered whitespace
|
|
in P-expressions.
|
|
|
|
ws = *(%x20 / %x09 / CR / LF)
|
|
|
|
<a id="delimiters"></a>
|
|
**Delimiters.** Because commas are no longer included in class `ws`,
|
|
class `delimiter` is widened to include them explicitly.
|
|
|
|
delimiter = ws / ","
|
|
/ "<" / ">" / "[" / "]" / "{" / "}"
|
|
/ "#" / ":" / DQUOTE / "|" / "@" / ";"
|
|
|
|
## Grammar
|
|
|
|
P-expressions add comma, semicolon, and sequences of one or more colons
|
|
to the syntax class `Value`.
|
|
|
|
Value =/ Comma / Semicolon / Colons
|
|
Comma = ","
|
|
Semicolon = ";"
|
|
Colons = 1*":"
|
|
|
|
Now that colon is in `Value`, the syntax for `Dictionary` is replaced
|
|
with `Block` everywhere it is mentioned.
|
|
|
|
Block = "{" *Value ws "}"
|
|
|
|
Syntax for `Record` is loosened to allow empty angle brackets.
|
|
|
|
Record = "<" *Value ws ">"
|
|
|
|
New syntax for explicit uninterpreted grouping of sequences of values is
|
|
introduced, and added to class `Value`.
|
|
|
|
Value =/ ws Group
|
|
Group = "(" *Value ws ")"
|
|
|
|
Finally, class `Document` is replaced in order to allow standalone
|
|
documents to directly comprise a sequence of multiple values.
|
|
|
|
Document = *Value ws
|
|
|
|
No changes to [the Preserves semantic model](preserves.html) are made.
|
|
Every Preserves text-syntax term is a valid P-expression, but in general
|
|
P-expressions must be rewritten or otherwise interpreted before a
|
|
meaningful Preserves value can be arrived at ([see
|
|
below](#reading-preserves)).
|
|
|
|
## <a id="annotations"></a>Annotations and Comments
|
|
|
|
Annotations and comments attach to the term following them, just as in
|
|
the ordinary text syntax. However, it is common in programming notations
|
|
to allow comments at the end of a file or other sequential construct:
|
|
|
|
{
|
|
key: value
|
|
# example of a comment at the end of a dictionary
|
|
}
|
|
# example of a comment at the end of the input file
|
|
|
|
While the ordinary text syntax forbids comments in these positions,
|
|
P-expressions allow them:
|
|
|
|
Document =/ *Value Trailer ws
|
|
Record =/ "<" *Value Trailer ws ">"
|
|
Sequence =/ "[" *Value Trailer ws "]"
|
|
Set =/ "#{" *Value Trailer ws "}"
|
|
Block =/ "{" *Value Trailer ws "}"
|
|
|
|
Trailer = 1*Annotation
|
|
|
|
## <a id="encoding-pexprs"></a>Encoding P-expressions as Preserves
|
|
|
|
We write ⌜*p*⌝ for the encoding into Preserves of P-expression *p*.
|
|
|
|
{:.pseudocode.equations}
|
|
| ⌜·⌝ : **P-expression** | ⟶ | **Preserves** |
|
|
|
|
Aside from `Group`, `Block`, `Comma`, `Semicolon`, `Colons`, `Trailer`,
|
|
and `Record`, P-expressions are encoded directly as Preserves data.
|
|
|
|
{:.pseudocode.equations}
|
|
| ⌜`[`*p* ...`]`⌝ | = | `[`⌜*p*⌝ ...`]` |
|
|
| ⌜`#{`*p* ...`}`⌝ | = | `#{`⌜*p*⌝ ...`}` |
|
|
| ⌜`#!`*p*⌝ | = | `#!`⌜*p*⌝ |
|
|
| ⌜`@`*p* *q*⌝ | = | `@`⌜*p*⌝ ⌜*q*⌝ |
|
|
| ⌜*p*⌝ | = | *p* when *p* ∈ **Atom** |
|
|
|
|
Everything else is encoded as Preserves records.
|
|
|
|
{:.pseudocode.equations}
|
|
| ⌜`<`*p* ...`>`⌝ | = | `<r` ⌜*p*⌝ ...`>` |
|
|
| ⌜`(`*p* ...`)`⌝ | = | `<g` ⌜*p*⌝ ...`>` |
|
|
| ⌜`{`*p* ...`}`⌝ | = | `<b` ⌜*p*⌝ ...`>` |
|
|
| ⌜`,`⌝ | = | `<s |,|>` |
|
|
| ⌜`;`⌝ | = | `<s |;|>` |
|
|
| ⌜`:` ...⌝ | = | `<s |:` ...`|>` |
|
|
| ⌜*t*⌝ | = | ⌜*a*⌝ ... `<a>`, where *a* ... are the annotations in *t* and *t* ∈ **Trailer** |
|
|
|
|
The record `<a>` acts as an anchor for the annotations in a `Trailer`.
|
|
|
|
We overload the ⌜·⌝ notation for encoding whole `Document`s into
|
|
sequences of Preserves values.
|
|
|
|
{:.pseudocode.equations}
|
|
| ⌜·⌝ : **P-expression Document** | ⟶ | **Preserves Sequence** |
|
|
| ⌜*p* ...⌝ | = | `[`⌜*p*⌝ ...`]` |
|
|
|
|
## <a id="reading-preserves"></a>Interpreting P-expressions as Preserves
|
|
|
|
The [previous section](#encoding-pexprs) discussed ways of representing
|
|
P-expressions using Preserves. Here, we discuss *interpreting*
|
|
P-expressions *as* Preserves, so that (1) a Preserves datum (2) written
|
|
using Preserves text syntax and then (3) read as a P-expression can be
|
|
(4) interpreted from that P-expression to yield the original datum.
|
|
|
|
A reader for P-expressions can be adapted to yield a reader for
|
|
Preserves terms by processing (subterms of) each P-expression that the
|
|
reader produces. The only subterms that need processing are the special
|
|
classes mentioned above.
|
|
|
|
1. Every `Group` or `Semicolon` that appears is an error.
|
|
2. Every `Colons` with two or more colons in it is an error.
|
|
3. Every `Comma` that appears is discarded.
|
|
4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error]
|
|
5. Every `Record` with no values in it is an error.
|
|
6. Every `Block` must contain triplets of `Value`, `Colons` (with a
|
|
single colon), `Value`. Any `Block` not following this pattern is an
|
|
error. Each `Block` following the pattern is translated to a
|
|
`Dictionary` containing a key/value pair for each triplet.
|
|
|
|
[^discard-trailers-instead-of-error]: **Implementation note.** When
|
|
implementing parsing of P-expressions into Preserves, consider
|
|
offering an optional mode where trailing annotations `Trailer` are
|
|
*discarded* instead of causing an error to be signalled.
|
|
|
|
## Appendix: Examples
|
|
|
|
Examples are given as pairs of P-expressions and their Preserves
|
|
text-syntax encodings.
|
|
|
|
### Individual P-expression `Value`s
|
|
|
|
```preserves
|
|
⌜<date 1821 (lookup-month "February") 3>⌝
|
|
= <r date 1821 <g lookup-month "February"> 3>
|
|
```
|
|
|
|
```preserves
|
|
⌜<>⌝
|
|
= <r>
|
|
```
|
|
|
|
```preserves
|
|
⌜(begin (println! (+ 1 2)) (+ 3 4))⌝
|
|
= <g begin <g println! <g + 1 2>> <g + 3 4>>
|
|
```
|
|
|
|
```preserves
|
|
⌜()⌝
|
|
= <g>
|
|
|
|
⌜[() () ()]⌝
|
|
= [<g>, <g>, <g>]
|
|
```
|
|
|
|
```preserves
|
|
⌜{
|
|
setUp();
|
|
# Now enter the loop
|
|
loop: {
|
|
greet("World");
|
|
}
|
|
tearDown();
|
|
}⌝
|
|
= <b
|
|
setUp <g> <s |;|>
|
|
# Now enter the loop
|
|
loop <s |:|> <b
|
|
greet <g "World"> <s |;|>
|
|
>
|
|
tearDown <g> <s |;|>
|
|
>
|
|
```
|
|
|
|
```preserves
|
|
⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝
|
|
= [1 + 2.0 <s |,|> print "Hello" <s |,|> predicate <s |:|> #t <s |,|>
|
|
foo <s |,|> #!remote <s |,|> bar]
|
|
```
|
|
|
|
```preserves
|
|
⌜{
|
|
optional name: string,
|
|
address: Address,
|
|
}⌝
|
|
= <b
|
|
optional name <s |:|> string <s |,|>
|
|
address <s |:|> Address <s |,|>
|
|
>
|
|
```
|
|
|
|
### Whole `Document`s
|
|
|
|
```preserves
|
|
⌜{
|
|
key: value
|
|
# example of a comment at the end of a dictionary
|
|
}
|
|
# example of a comment at the end of the input file⌝
|
|
= [ <b
|
|
key <s |:|> value
|
|
@"example of a comment at the end of a dictionary" <a>
|
|
>
|
|
@"example of a comment at the end of the input file"
|
|
<a>
|
|
]
|
|
```
|
|
|
|
## Appendix: Reading vs. Parsing
|
|
|
|
Lisp systems first *read* streams of bytes into S-expressions and then
|
|
*parse* those S-expressions into more abstract structures denoting
|
|
various kinds of program syntax. [Separation of reading from parsing is
|
|
what gives Lisp its syntactic
|
|
flexibility.](http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-point/)
|
|
|
|
Similarly, the Apple programming language
|
|
[Dylan](https://en.wikipedia.org/wiki/Dylan_(programming_language))
|
|
included a reader-parser split, with the Dylan reader producing
|
|
*D-expressions* that are somewhat similar to P-expressions.
|
|
|
|
Finally, the Racket dialects
|
|
[Honu](https://docs.racket-lang.org/honu/index.html) and
|
|
[Something](https://github.com/tonyg/racket-something) use a
|
|
reader-parser-macro setup, where the reader produces Racket data, the
|
|
parser produces "syntax" and is user-extensible, and Racket's own
|
|
modular macro system rewrites this "syntax" down to core forms to be
|
|
compiled to machine code.
|
|
|
|
Similarly, when using P-expressions as the foundation for a language, a
|
|
generic P-expression reader can then feed into special-purpose
|
|
*parsers*. The reader captures the coarse syntactic structure of a
|
|
program, and the parser refines this.
|
|
|
|
Often, a parser will wish to extract structure from sequences of
|
|
P-expression `Value`s.
|
|
|
|
- A simple technique is repeated splitting of sequences; first by
|
|
`Semicolon`, then by `Comma`, then by increasingly high binding-power
|
|
operators.
|
|
|
|
- More refined is to use a Pratt parser or similar
|
|
([1](https://en.wikipedia.org/wiki/Operator-precedence_parser),
|
|
[2](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html),
|
|
[3](https://github.com/tonyg/racket-something/blob/f6116bf3861b76970f5ce291a628476adef820b4/src/something/pratt.rkt))
|
|
to build a parse tree using an extensible specification of the pre-,
|
|
in-, and postfix operators involved.
|
|
|
|
- Finally, if you treat sequences of `Value`s as pre-lexed token
|
|
streams, almost any parsing formalism (such as [PEG
|
|
parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),
|
|
[Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to
|
|
extract further syntactic structure.
|
|
|
|
## Notes
|