preserves-expressions.md

2023-10-31 17:37:09 +01:00 · 2023-10-31 17:37:09 +01:00 · a69444f085
parent 982d916b61
commit a69444f085
3 changed files with 217 additions and 1 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,4 +1,5 @@
 _site/
+preserves-expressions.pdf
 preserves-binary.pdf
 preserves-schema.pdf
 preserves-text.pdf
--- a/7
+++ b/7
@ -1,6 +1,11 @@
 __ignored__ := $(shell ./setup.sh)

-PDFS=preserves.pdf preserves-text.pdf preserves-binary.pdf preserves-schema.pdf
+PDFS=\
+	preserves.pdf \
+	preserves-text.pdf \
+	preserves-binary.pdf \
+	preserves-schema.pdf \
+	preserves-expressions.pdf

 all: $(PDFS)

--- a/preserves-expressions.md
+++ b/preserves-expressions.md
@ -0,0 +1,210 @@
+---
+title: "P-expressions"
+---
+
+Tony Garnock-Jones <tonyg@leastfixedpoint.com>  
+October 2023. Version 0.1.0.
+
+This document defines a grammar called *Preserves Expressions*
+(*P-expressions*, *pexprs*) that includes [ordinary Preserves text
+syntax](preserves-text.html) but offers extensions sufficient to support
+a Lisp- or Haskell-like programming notation.
+
+**Motivation.** The [text syntax](preserves-text.html) for Preserves
+works well for writing `Value`s, i.e. data. However, in some contexts,
+Preserves applications need a broader grammar that allows interleaving
+of *expressions* with data. Two examples are the [Preserves Schema
+language](preserves-schema.html) and the [Synit configuration scripting
+language](https://synit.org/book/operation/scripting.html), both of
+which (ab)use Preserves text syntax as a kind of programming notation.
+
+## Preliminaries
+
+The P-expression grammar takes the text syntax grammar as its base and
+modifies it.
+
+<a id="whitespace">
+**Whitespace.** Whitespace is redefined as any number of spaces, tabs,
+carriage returns, or line feeds. Commas are *not* considered whitespace
+in P-expressions.
+
+                ws = *(%x20 / %x09 / CR / LF)
+
+<a id="delimiters"></a>
+**Delimiters.** Because commas are no longer included in class `ws`,
+class `delimiter` is widened to include them explicitly.
+
+         delimiter = ws / ","
+                   / "<" / ">" / "[" / "]" / "{" / "}"
+                   / "#" / ":" / DQUOTE / "|" / "@" / ";"
+
+## Grammar
+
+P-expressions add comma, semicolon, and sequences of one or more colons
+to the syntax class `Value`.
+
+            Value =/ Comma / Semicolon / Colons
+             Comma = ","
+         Semicolon = ";"
+            Colons = 1*":"
+
+Now that colon is in `Value`, the syntax for `Dictionary` is replaced
+with `Block` everywhere it is mentioned.
+
+             Block =  "{" *Value ws "}"
+
+New syntax for explicit uninterpreted grouping of sequences of values is
+introduced, and added to class `Value`.
+
+            Value =/ ws Group
+             Group = "(" *Value ws ")"
+
+Finally, class `Document` is replaced in order to allow standalone
+documents to directly comprise a sequence of multiple values.
+
+          Document = *Value ws
+
+No changes to [the Preserves semantic model](preserves.html) are made.
+Every Preserves text-syntax term is a valid P-expression, but in general
+P-expressions must be rewritten or otherwise interpreted before a
+meaningful Preserves value can be arrived at.
+
+## Encoding P-expressions as Preserves
+
+Aside from the special classes `Group`, `Block`, `Comma`, `Semicolon` or
+`Colons`, P-expressions are directly encodable as Preserves data. All
+members of the special classes are encoded as Preserves text
+`Dictionary`[^encoding-rationale] values:
+
+[^encoding-rationale]: In principle, it would be nice to use *records*
+    for this purpose, but if we did so we would have to also encode
+    usages of records!
+
+{:.pseudocode}
+> ⌜`(`*p* ...`)`⌝ ⟶ `{g:[`⌜*p*⌝ ...`]}`
+> ⌜`{`*p* ...`}`⌝ ⟶ `{b:[`⌜*p*⌝ ...`]}`
+> ⌜`,`⌝ ⟶ `{s:|,|}`
+> ⌜`;`⌝ ⟶ `{s:|;|}`
+> ⌜`:` ...⌝ ⟶ `{s:|:` ...`|}`
+
+## Appendix: Examples
+
+Examples are given as pairs of P-expressions and their Preserves
+text-syntax encodings.
+
+```preserves
+ ⌜<date 1821 (lookup-month "February") 3>⌝
+= <date 1821 {g:[lookup-month "February"]} 3>
+```
+
+```preserves
+ ⌜(begin (println! (+ 1 2)) (+ 3 4))⌝
+= {g:[begin {g:[println! {g:[+ 1 2]}]} {g:[+ 3 4]}]}
+```
+
+```preserves
+ ⌜()⌝
+= {g:[]}
+
+ ⌜[() () ()]⌝
+= [{g:[]}, {g:[]}, {g:[]}]
+```
+
+```preserves
+ ⌜{
+      setUp();
+      # Now enter the loop
+      loop: {
+          greet("World");
+      }
+      tearDown();
+  }⌝
+= {b:[
+      setUp {g:[]} {s:|;|}
+      # Now enter the loop
+      loop {s:|:|} {b:[
+          greet {g:["World"]} {s:|;|}
+      ]}
+      tearDown {g:[]} {s:|;|}
+  ]}
+```
+
+```preserves
+ ⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝
+= [1 + 2.0 {s:|,|} print "Hello" {s:|,|} predicate {s:|:|} #t {s:|,|}
+   foo {s:|,|} #!remote {s:|,|} bar]
+```
+
+```preserves
+ ⌜{
+      optional name: string,
+      address: Address,
+  }⌝
+= {b:[
+      optional name {s:|:|} string {s:|,|}
+      address {s:|:|} Address {s:|,|}
+  ]}
+```
+
+## Appendix: Using a P-expression reader to read Preserves
+
+A reader for P-expressions can be adapted to yield a reader for
+Preserves terms by processing (subterms of) each P-expression that the
+reader produces. The only subterms that need processing are the special
+classes mentioned above.
+
+ 1. Every `Group` or `Semicolon` that appears is an error.
+ 2. Every `Colons` with two or more colons in it is an error.
+ 3. Every `Comma` that appears is removed from its container.
+ 4. Every `Block` must contain triplets of `Value`, `Colons` (with a
+    single colon), `Value`. Any `Block` not following this pattern is an
+    error. Each `Block` following the pattern is translated to a
+    `Dictionary` containing a key/value pair for each triplet.
+
+## Appendix: Reading vs. Parsing
+
+Lisp systems first *read* streams of bytes into S-expressions and then
+*parse* those S-expressions into more abstract structures denoting
+various kinds of program syntax. [Separation of reading from parsing is
+what gives Lisp its syntactic
+flexibility.](http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-point/)
+
+Similarly, the Apple programming language
+[Dylan](https://en.wikipedia.org/wiki/Dylan_(programming_language))
+included a reader-parser split, with the Dylan reader producing
+*D-expressions* that are somewhat similar to P-expressions.
+
+Finally, the Racket dialects
+[Honu](https://docs.racket-lang.org/honu/index.html) and
+[Something](https://github.com/tonyg/racket-something) use a
+reader-parser-macro setup, where the reader produces Racket data, the
+parser produces "syntax" and is user-extensible, and Racket's own
+modular macro system rewrites this "syntax" down to core forms to be
+compiled to machine code.
+
+Similarly, when using P-expressions as the foundation for a language, a
+generic P-expression reader can then feed into special-purpose
+*parsers*. The reader captures the coarse syntactic structure of a
+program, and the parser refines this.
+
+Often, a parser will wish to extract structure from sequences of
+P-expression `Value`s.
+
+ - A simple technique is repeated splitting of sequences; first by
+   `Semicolon`, then by `Comma`, then by increasingly high binding-power
+   operators.
+
+ - More refined is to use a Pratt parser or similar
+   ([1](https://en.wikipedia.org/wiki/Operator-precedence_parser),
+   [2](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html),
+   [3](https://github.com/tonyg/racket-something/blob/f6116bf3861b76970f5ce291a628476adef820b4/src/something/pratt.rkt))
+   to build a parse tree using an extensible specification of the pre-,
+   in-, and postfix operators involved.
+
+ - Finally, if you treat sequences of `Value`s as pre-lexed token
+   streams, almost any parsing formalism (such as [PEG
+   parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),
+   [Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to
+   extract further syntactic structure.
+
+## Notes