diff --git a/.gitignore b/.gitignore index 1fc3cf7..f631d1d 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,5 @@ _site/ +preserves-expressions.pdf preserves-binary.pdf preserves-schema.pdf preserves-text.pdf diff --git a/Makefile b/Makefile index 5f8abc8..ac1dc8b 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,11 @@ __ignored__ := $(shell ./setup.sh) -PDFS=preserves.pdf preserves-text.pdf preserves-binary.pdf preserves-schema.pdf +PDFS=\ + preserves.pdf \ + preserves-text.pdf \ + preserves-binary.pdf \ + preserves-schema.pdf \ + preserves-expressions.pdf all: $(PDFS) diff --git a/preserves-expressions.md b/preserves-expressions.md new file mode 100644 index 0000000..2adf01e --- /dev/null +++ b/preserves-expressions.md @@ -0,0 +1,210 @@ +--- +title: "P-expressions" +--- + +Tony Garnock-Jones +October 2023. Version 0.1.0. + +This document defines a grammar called *Preserves Expressions* +(*P-expressions*, *pexprs*) that includes [ordinary Preserves text +syntax](preserves-text.html) but offers extensions sufficient to support +a Lisp- or Haskell-like programming notation. + +**Motivation.** The [text syntax](preserves-text.html) for Preserves +works well for writing `Value`s, i.e. data. However, in some contexts, +Preserves applications need a broader grammar that allows interleaving +of *expressions* with data. Two examples are the [Preserves Schema +language](preserves-schema.html) and the [Synit configuration scripting +language](https://synit.org/book/operation/scripting.html), both of +which (ab)use Preserves text syntax as a kind of programming notation. + +## Preliminaries + +The P-expression grammar takes the text syntax grammar as its base and +modifies it. + + +**Whitespace.** Whitespace is redefined as any number of spaces, tabs, +carriage returns, or line feeds. Commas are *not* considered whitespace +in P-expressions. + + ws = *(%x20 / %x09 / CR / LF) + + +**Delimiters.** Because commas are no longer included in class `ws`, +class `delimiter` is widened to include them explicitly. + + delimiter = ws / "," + / "<" / ">" / "[" / "]" / "{" / "}" + / "#" / ":" / DQUOTE / "|" / "@" / ";" + +## Grammar + +P-expressions add comma, semicolon, and sequences of one or more colons +to the syntax class `Value`. + + Value =/ Comma / Semicolon / Colons + Comma = "," + Semicolon = ";" + Colons = 1*":" + +Now that colon is in `Value`, the syntax for `Dictionary` is replaced +with `Block` everywhere it is mentioned. + + Block = "{" *Value ws "}" + +New syntax for explicit uninterpreted grouping of sequences of values is +introduced, and added to class `Value`. + + Value =/ ws Group + Group = "(" *Value ws ")" + +Finally, class `Document` is replaced in order to allow standalone +documents to directly comprise a sequence of multiple values. + + Document = *Value ws + +No changes to [the Preserves semantic model](preserves.html) are made. +Every Preserves text-syntax term is a valid P-expression, but in general +P-expressions must be rewritten or otherwise interpreted before a +meaningful Preserves value can be arrived at. + +## Encoding P-expressions as Preserves + +Aside from the special classes `Group`, `Block`, `Comma`, `Semicolon` or +`Colons`, P-expressions are directly encodable as Preserves data. All +members of the special classes are encoded as Preserves text +`Dictionary`[^encoding-rationale] values: + +[^encoding-rationale]: In principle, it would be nice to use *records* + for this purpose, but if we did so we would have to also encode + usages of records! + +{:.pseudocode} +> ⌜`(`*p* ...`)`⌝ ⟶ `{g:[`⌜*p*⌝ ...`]}` +> ⌜`{`*p* ...`}`⌝ ⟶ `{b:[`⌜*p*⌝ ...`]}` +> ⌜`,`⌝ ⟶ `{s:|,|}` +> ⌜`;`⌝ ⟶ `{s:|;|}` +> ⌜`:` ...⌝ ⟶ `{s:|:` ...`|}` + +## Appendix: Examples + +Examples are given as pairs of P-expressions and their Preserves +text-syntax encodings. + +```preserves + ⌜⌝ += +``` + +```preserves + ⌜(begin (println! (+ 1 2)) (+ 3 4))⌝ += {g:[begin {g:[println! {g:[+ 1 2]}]} {g:[+ 3 4]}]} +``` + +```preserves + ⌜()⌝ += {g:[]} + + ⌜[() () ()]⌝ += [{g:[]}, {g:[]}, {g:[]}] +``` + +```preserves + ⌜{ + setUp(); + # Now enter the loop + loop: { + greet("World"); + } + tearDown(); + }⌝ += {b:[ + setUp {g:[]} {s:|;|} + # Now enter the loop + loop {s:|:|} {b:[ + greet {g:["World"]} {s:|;|} + ]} + tearDown {g:[]} {s:|;|} + ]} +``` + +```preserves + ⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝ += [1 + 2.0 {s:|,|} print "Hello" {s:|,|} predicate {s:|:|} #t {s:|,|} + foo {s:|,|} #!remote {s:|,|} bar] +``` + +```preserves + ⌜{ + optional name: string, + address: Address, + }⌝ += {b:[ + optional name {s:|:|} string {s:|,|} + address {s:|:|} Address {s:|,|} + ]} +``` + +## Appendix: Using a P-expression reader to read Preserves + +A reader for P-expressions can be adapted to yield a reader for +Preserves terms by processing (subterms of) each P-expression that the +reader produces. The only subterms that need processing are the special +classes mentioned above. + + 1. Every `Group` or `Semicolon` that appears is an error. + 2. Every `Colons` with two or more colons in it is an error. + 3. Every `Comma` that appears is removed from its container. + 4. Every `Block` must contain triplets of `Value`, `Colons` (with a + single colon), `Value`. Any `Block` not following this pattern is an + error. Each `Block` following the pattern is translated to a + `Dictionary` containing a key/value pair for each triplet. + +## Appendix: Reading vs. Parsing + +Lisp systems first *read* streams of bytes into S-expressions and then +*parse* those S-expressions into more abstract structures denoting +various kinds of program syntax. [Separation of reading from parsing is +what gives Lisp its syntactic +flexibility.](http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-point/) + +Similarly, the Apple programming language +[Dylan](https://en.wikipedia.org/wiki/Dylan_(programming_language)) +included a reader-parser split, with the Dylan reader producing +*D-expressions* that are somewhat similar to P-expressions. + +Finally, the Racket dialects +[Honu](https://docs.racket-lang.org/honu/index.html) and +[Something](https://github.com/tonyg/racket-something) use a +reader-parser-macro setup, where the reader produces Racket data, the +parser produces "syntax" and is user-extensible, and Racket's own +modular macro system rewrites this "syntax" down to core forms to be +compiled to machine code. + +Similarly, when using P-expressions as the foundation for a language, a +generic P-expression reader can then feed into special-purpose +*parsers*. The reader captures the coarse syntactic structure of a +program, and the parser refines this. + +Often, a parser will wish to extract structure from sequences of +P-expression `Value`s. + + - A simple technique is repeated splitting of sequences; first by + `Semicolon`, then by `Comma`, then by increasingly high binding-power + operators. + + - More refined is to use a Pratt parser or similar + ([1](https://en.wikipedia.org/wiki/Operator-precedence_parser), + [2](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html), + [3](https://github.com/tonyg/racket-something/blob/f6116bf3861b76970f5ce291a628476adef820b4/src/something/pratt.rkt)) + to build a parse tree using an extensible specification of the pre-, + in-, and postfix operators involved. + + - Finally, if you treat sequences of `Value`s as pre-lexed token + streams, almost any parsing formalism (such as [PEG + parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar), + [Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to + extract further syntactic structure. + +## Notes