preserves-expressions.md
This commit is contained in:
parent
982d916b61
commit
a69444f085
|
@ -1,4 +1,5 @@
|
|||
_site/
|
||||
preserves-expressions.pdf
|
||||
preserves-binary.pdf
|
||||
preserves-schema.pdf
|
||||
preserves-text.pdf
|
||||
|
|
7
Makefile
7
Makefile
|
@ -1,6 +1,11 @@
|
|||
__ignored__ := $(shell ./setup.sh)
|
||||
|
||||
PDFS=preserves.pdf preserves-text.pdf preserves-binary.pdf preserves-schema.pdf
|
||||
PDFS=\
|
||||
preserves.pdf \
|
||||
preserves-text.pdf \
|
||||
preserves-binary.pdf \
|
||||
preserves-schema.pdf \
|
||||
preserves-expressions.pdf
|
||||
|
||||
all: $(PDFS)
|
||||
|
||||
|
|
|
@ -0,0 +1,210 @@
|
|||
---
|
||||
title: "P-expressions"
|
||||
---
|
||||
|
||||
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
||||
October 2023. Version 0.1.0.
|
||||
|
||||
This document defines a grammar called *Preserves Expressions*
|
||||
(*P-expressions*, *pexprs*) that includes [ordinary Preserves text
|
||||
syntax](preserves-text.html) but offers extensions sufficient to support
|
||||
a Lisp- or Haskell-like programming notation.
|
||||
|
||||
**Motivation.** The [text syntax](preserves-text.html) for Preserves
|
||||
works well for writing `Value`s, i.e. data. However, in some contexts,
|
||||
Preserves applications need a broader grammar that allows interleaving
|
||||
of *expressions* with data. Two examples are the [Preserves Schema
|
||||
language](preserves-schema.html) and the [Synit configuration scripting
|
||||
language](https://synit.org/book/operation/scripting.html), both of
|
||||
which (ab)use Preserves text syntax as a kind of programming notation.
|
||||
|
||||
## Preliminaries
|
||||
|
||||
The P-expression grammar takes the text syntax grammar as its base and
|
||||
modifies it.
|
||||
|
||||
<a id="whitespace">
|
||||
**Whitespace.** Whitespace is redefined as any number of spaces, tabs,
|
||||
carriage returns, or line feeds. Commas are *not* considered whitespace
|
||||
in P-expressions.
|
||||
|
||||
ws = *(%x20 / %x09 / CR / LF)
|
||||
|
||||
<a id="delimiters"></a>
|
||||
**Delimiters.** Because commas are no longer included in class `ws`,
|
||||
class `delimiter` is widened to include them explicitly.
|
||||
|
||||
delimiter = ws / ","
|
||||
/ "<" / ">" / "[" / "]" / "{" / "}"
|
||||
/ "#" / ":" / DQUOTE / "|" / "@" / ";"
|
||||
|
||||
## Grammar
|
||||
|
||||
P-expressions add comma, semicolon, and sequences of one or more colons
|
||||
to the syntax class `Value`.
|
||||
|
||||
Value =/ Comma / Semicolon / Colons
|
||||
Comma = ","
|
||||
Semicolon = ";"
|
||||
Colons = 1*":"
|
||||
|
||||
Now that colon is in `Value`, the syntax for `Dictionary` is replaced
|
||||
with `Block` everywhere it is mentioned.
|
||||
|
||||
Block = "{" *Value ws "}"
|
||||
|
||||
New syntax for explicit uninterpreted grouping of sequences of values is
|
||||
introduced, and added to class `Value`.
|
||||
|
||||
Value =/ ws Group
|
||||
Group = "(" *Value ws ")"
|
||||
|
||||
Finally, class `Document` is replaced in order to allow standalone
|
||||
documents to directly comprise a sequence of multiple values.
|
||||
|
||||
Document = *Value ws
|
||||
|
||||
No changes to [the Preserves semantic model](preserves.html) are made.
|
||||
Every Preserves text-syntax term is a valid P-expression, but in general
|
||||
P-expressions must be rewritten or otherwise interpreted before a
|
||||
meaningful Preserves value can be arrived at.
|
||||
|
||||
## Encoding P-expressions as Preserves
|
||||
|
||||
Aside from the special classes `Group`, `Block`, `Comma`, `Semicolon` or
|
||||
`Colons`, P-expressions are directly encodable as Preserves data. All
|
||||
members of the special classes are encoded as Preserves text
|
||||
`Dictionary`[^encoding-rationale] values:
|
||||
|
||||
[^encoding-rationale]: In principle, it would be nice to use *records*
|
||||
for this purpose, but if we did so we would have to also encode
|
||||
usages of records!
|
||||
|
||||
{:.pseudocode}
|
||||
> ⌜`(`*p* ...`)`⌝ ⟶ `{g:[`⌜*p*⌝ ...`]}`
|
||||
> ⌜`{`*p* ...`}`⌝ ⟶ `{b:[`⌜*p*⌝ ...`]}`
|
||||
> ⌜`,`⌝ ⟶ `{s:|,|}`
|
||||
> ⌜`;`⌝ ⟶ `{s:|;|}`
|
||||
> ⌜`:` ...⌝ ⟶ `{s:|:` ...`|}`
|
||||
|
||||
## Appendix: Examples
|
||||
|
||||
Examples are given as pairs of P-expressions and their Preserves
|
||||
text-syntax encodings.
|
||||
|
||||
```preserves
|
||||
⌜<date 1821 (lookup-month "February") 3>⌝
|
||||
= <date 1821 {g:[lookup-month "February"]} 3>
|
||||
```
|
||||
|
||||
```preserves
|
||||
⌜(begin (println! (+ 1 2)) (+ 3 4))⌝
|
||||
= {g:[begin {g:[println! {g:[+ 1 2]}]} {g:[+ 3 4]}]}
|
||||
```
|
||||
|
||||
```preserves
|
||||
⌜()⌝
|
||||
= {g:[]}
|
||||
|
||||
⌜[() () ()]⌝
|
||||
= [{g:[]}, {g:[]}, {g:[]}]
|
||||
```
|
||||
|
||||
```preserves
|
||||
⌜{
|
||||
setUp();
|
||||
# Now enter the loop
|
||||
loop: {
|
||||
greet("World");
|
||||
}
|
||||
tearDown();
|
||||
}⌝
|
||||
= {b:[
|
||||
setUp {g:[]} {s:|;|}
|
||||
# Now enter the loop
|
||||
loop {s:|:|} {b:[
|
||||
greet {g:["World"]} {s:|;|}
|
||||
]}
|
||||
tearDown {g:[]} {s:|;|}
|
||||
]}
|
||||
```
|
||||
|
||||
```preserves
|
||||
⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝
|
||||
= [1 + 2.0 {s:|,|} print "Hello" {s:|,|} predicate {s:|:|} #t {s:|,|}
|
||||
foo {s:|,|} #!remote {s:|,|} bar]
|
||||
```
|
||||
|
||||
```preserves
|
||||
⌜{
|
||||
optional name: string,
|
||||
address: Address,
|
||||
}⌝
|
||||
= {b:[
|
||||
optional name {s:|:|} string {s:|,|}
|
||||
address {s:|:|} Address {s:|,|}
|
||||
]}
|
||||
```
|
||||
|
||||
## Appendix: Using a P-expression reader to read Preserves
|
||||
|
||||
A reader for P-expressions can be adapted to yield a reader for
|
||||
Preserves terms by processing (subterms of) each P-expression that the
|
||||
reader produces. The only subterms that need processing are the special
|
||||
classes mentioned above.
|
||||
|
||||
1. Every `Group` or `Semicolon` that appears is an error.
|
||||
2. Every `Colons` with two or more colons in it is an error.
|
||||
3. Every `Comma` that appears is removed from its container.
|
||||
4. Every `Block` must contain triplets of `Value`, `Colons` (with a
|
||||
single colon), `Value`. Any `Block` not following this pattern is an
|
||||
error. Each `Block` following the pattern is translated to a
|
||||
`Dictionary` containing a key/value pair for each triplet.
|
||||
|
||||
## Appendix: Reading vs. Parsing
|
||||
|
||||
Lisp systems first *read* streams of bytes into S-expressions and then
|
||||
*parse* those S-expressions into more abstract structures denoting
|
||||
various kinds of program syntax. [Separation of reading from parsing is
|
||||
what gives Lisp its syntactic
|
||||
flexibility.](http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-point/)
|
||||
|
||||
Similarly, the Apple programming language
|
||||
[Dylan](https://en.wikipedia.org/wiki/Dylan_(programming_language))
|
||||
included a reader-parser split, with the Dylan reader producing
|
||||
*D-expressions* that are somewhat similar to P-expressions.
|
||||
|
||||
Finally, the Racket dialects
|
||||
[Honu](https://docs.racket-lang.org/honu/index.html) and
|
||||
[Something](https://github.com/tonyg/racket-something) use a
|
||||
reader-parser-macro setup, where the reader produces Racket data, the
|
||||
parser produces "syntax" and is user-extensible, and Racket's own
|
||||
modular macro system rewrites this "syntax" down to core forms to be
|
||||
compiled to machine code.
|
||||
|
||||
Similarly, when using P-expressions as the foundation for a language, a
|
||||
generic P-expression reader can then feed into special-purpose
|
||||
*parsers*. The reader captures the coarse syntactic structure of a
|
||||
program, and the parser refines this.
|
||||
|
||||
Often, a parser will wish to extract structure from sequences of
|
||||
P-expression `Value`s.
|
||||
|
||||
- A simple technique is repeated splitting of sequences; first by
|
||||
`Semicolon`, then by `Comma`, then by increasingly high binding-power
|
||||
operators.
|
||||
|
||||
- More refined is to use a Pratt parser or similar
|
||||
([1](https://en.wikipedia.org/wiki/Operator-precedence_parser),
|
||||
[2](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html),
|
||||
[3](https://github.com/tonyg/racket-something/blob/f6116bf3861b76970f5ce291a628476adef820b4/src/something/pratt.rkt))
|
||||
to build a parse tree using an extensible specification of the pre-,
|
||||
in-, and postfix operators involved.
|
||||
|
||||
- Finally, if you treat sequences of `Value`s as pre-lexed token
|
||||
streams, almost any parsing formalism (such as [PEG
|
||||
parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),
|
||||
[Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to
|
||||
extract further syntactic structure.
|
||||
|
||||
## Notes
|
Loading…
Reference in New Issue