7.0 KiB
title |
---|
P-expressions |
Tony Garnock-Jones tonyg@leastfixedpoint.com
October 2023. Version 0.1.0.
This document defines a grammar called Preserves Expressions (P-expressions, pexprs) that includes ordinary Preserves text syntax but offers extensions sufficient to support a Lisp- or Haskell-like programming notation.
Motivation. The text syntax for Preserves
works well for writing Value
s, i.e. data. However, in some contexts,
Preserves applications need a broader grammar that allows interleaving
of expressions with data. Two examples are the Preserves Schema
language and the Synit configuration scripting
language, both of
which (ab)use Preserves text syntax as a kind of programming notation.
Preliminaries
The P-expression grammar takes the text syntax grammar as its base and modifies it.
**Whitespace.** Whitespace is redefined as any number of spaces, tabs, carriage returns, or line feeds. Commas are *not* considered whitespace in P-expressions. ws = *(%x20 / %x09 / CR / LF)
Delimiters. Because commas are no longer included in class ws
,
class delimiter
is widened to include them explicitly.
delimiter = ws / ","
/ "<" / ">" / "[" / "]" / "{" / "}"
/ "#" / ":" / DQUOTE / "|" / "@" / ";"
Grammar
P-expressions add comma, semicolon, and sequences of one or more colons
to the syntax class Value
.
Value =/ Comma / Semicolon / Colons
Comma = ","
Semicolon = ";"
Colons = 1*":"
Now that colon is in Value
, the syntax for Dictionary
is replaced
with Block
everywhere it is mentioned.
Block = "{" *Value ws "}"
New syntax for explicit uninterpreted grouping of sequences of values is
introduced, and added to class Value
.
Value =/ ws Group
Group = "(" *Value ws ")"
Finally, class Document
is replaced in order to allow standalone
documents to directly comprise a sequence of multiple values.
Document = *Value ws
No changes to the Preserves semantic model are made. Every Preserves text-syntax term is a valid P-expression, but in general P-expressions must be rewritten or otherwise interpreted before a meaningful Preserves value can be arrived at.
Encoding P-expressions as Preserves
Aside from the special classes Group
, Block
, Comma
, Semicolon
or
Colons
, P-expressions are directly encodable as Preserves data. All
members of the special classes are encoded as Preserves text
Dictionary
1 values:
{:.pseudocode}
⌜
(
p ...)
⌝ ⟶{g:[
⌜p⌝ ...]}
⌜{
p ...}
⌝ ⟶{b:[
⌜p⌝ ...]}
⌜,
⌝ ⟶{s:|,|}
⌜;
⌝ ⟶{s:|;|}
⌜:
...⌝ ⟶{s:|:
...|}
Appendix: Examples
Examples are given as pairs of P-expressions and their Preserves text-syntax encodings.
⌜<date 1821 (lookup-month "February") 3>⌝
= <date 1821 {g:[lookup-month "February"]} 3>
⌜(begin (println! (+ 1 2)) (+ 3 4))⌝
= {g:[begin {g:[println! {g:[+ 1 2]}]} {g:[+ 3 4]}]}
⌜()⌝
= {g:[]}
⌜[() () ()]⌝
= [{g:[]}, {g:[]}, {g:[]}]
⌜{
setUp();
# Now enter the loop
loop: {
greet("World");
}
tearDown();
}⌝
= {b:[
setUp {g:[]} {s:|;|}
# Now enter the loop
loop {s:|:|} {b:[
greet {g:["World"]} {s:|;|}
]}
tearDown {g:[]} {s:|;|}
]}
⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝
= [1 + 2.0 {s:|,|} print "Hello" {s:|,|} predicate {s:|:|} #t {s:|,|}
foo {s:|,|} #!remote {s:|,|} bar]
⌜{
optional name: string,
address: Address,
}⌝
= {b:[
optional name {s:|:|} string {s:|,|}
address {s:|:|} Address {s:|,|}
]}
Appendix: Using a P-expression reader to read Preserves
A reader for P-expressions can be adapted to yield a reader for Preserves terms by processing (subterms of) each P-expression that the reader produces. The only subterms that need processing are the special classes mentioned above.
- Every
Group
orSemicolon
that appears is an error. - Every
Colons
with two or more colons in it is an error. - Every
Comma
that appears is removed from its container. - Every
Block
must contain triplets ofValue
,Colons
(with a single colon),Value
. AnyBlock
not following this pattern is an error. EachBlock
following the pattern is translated to aDictionary
containing a key/value pair for each triplet.
Appendix: Reading vs. Parsing
Lisp systems first read streams of bytes into S-expressions and then parse those S-expressions into more abstract structures denoting various kinds of program syntax. Separation of reading from parsing is what gives Lisp its syntactic flexibility.
Similarly, the Apple programming language Dylan included a reader-parser split, with the Dylan reader producing D-expressions that are somewhat similar to P-expressions.
Finally, the Racket dialects Honu and Something use a reader-parser-macro setup, where the reader produces Racket data, the parser produces "syntax" and is user-extensible, and Racket's own modular macro system rewrites this "syntax" down to core forms to be compiled to machine code.
Similarly, when using P-expressions as the foundation for a language, a generic P-expression reader can then feed into special-purpose parsers. The reader captures the coarse syntactic structure of a program, and the parser refines this.
Often, a parser will wish to extract structure from sequences of
P-expression Value
s.
-
A simple technique is repeated splitting of sequences; first by
Semicolon
, then byComma
, then by increasingly high binding-power operators. -
More refined is to use a Pratt parser or similar (1, 2, 3) to build a parse tree using an extensible specification of the pre-, in-, and postfix operators involved.
-
Finally, if you treat sequences of
Value
s as pre-lexed token streams, almost any parsing formalism (such as PEG parsing, Ometa, etc.) can be used to extract further syntactic structure.
Notes
-
In principle, it would be nice to use records for this purpose, but if we did so we would have to also encode usages of records! ↩︎