preserves-expressions.md

2023-10-31 17:37:09 +01:00 · 2023-10-31 17:37:09 +01:00 · a69444f085
parent 982d916b61
commit a69444f085
3 changed files with 217 additions and 1 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,4 +1,5 @@
 _site/
 preserves-expressions.pdf
 preserves-binary.pdf
 preserves-schema.pdf
 preserves-text.pdf
--- a/7
+++ b/7
@ -1,6 +1,11 @@
 __ignored__ := $(shell ./setup.sh)
-PDFS=preserves.pdf preserves-text.pdf preserves-binary.pdf preserves-schema.pdf
+PDFS=\
 	preserves.pdf \
 	preserves-text.pdf \
 	preserves-binary.pdf \
 	preserves-schema.pdf \
 	preserves-expressions.pdf
 all: $(PDFS)
--- a/preserves-expressions.md
+++ b/preserves-expressions.md
@ -0,0 +1,210 @@
 ---
 title: "P-expressions"
 ---
 Tony Garnock-Jones <tonyg@leastfixedpoint.com>  
 October 2023. Version 0.1.0.
 This document defines a grammar called *Preserves Expressions*
 (*P-expressions*, *pexprs*) that includes [ordinary Preserves text
 syntax](preserves-text.html) but offers extensions sufficient to support
 a Lisp- or Haskell-like programming notation.
 **Motivation.** The [text syntax](preserves-text.html) for Preserves
 works well for writing `Value`s, i.e. data. However, in some contexts,
 Preserves applications need a broader grammar that allows interleaving
 of *expressions* with data. Two examples are the [Preserves Schema
 language](preserves-schema.html) and the [Synit configuration scripting
 language](https://synit.org/book/operation/scripting.html), both of
 which (ab)use Preserves text syntax as a kind of programming notation.
 ## Preliminaries
 The P-expression grammar takes the text syntax grammar as its base and
 modifies it.
 <a id="whitespace">
 **Whitespace.** Whitespace is redefined as any number of spaces, tabs,
 carriage returns, or line feeds. Commas are *not* considered whitespace
 in P-expressions.
                ws = *(%x20 / %x09 / CR / LF)
 <a id="delimiters"></a>
 **Delimiters.** Because commas are no longer included in class `ws`,
 class `delimiter` is widened to include them explicitly.
         delimiter = ws / ","
                   / "<" / ">" / "[" / "]" / "{" / "}"
                   / "#" / ":" / DQUOTE / "|" / "@" / ";"
 ## Grammar
 P-expressions add comma, semicolon, and sequences of one or more colons
 to the syntax class `Value`.
            Value =/ Comma / Semicolon / Colons
             Comma = ","
         Semicolon = ";"
            Colons = 1*":"
 Now that colon is in `Value`, the syntax for `Dictionary` is replaced
 with `Block` everywhere it is mentioned.
             Block =  "{" *Value ws "}"
 New syntax for explicit uninterpreted grouping of sequences of values is
 introduced, and added to class `Value`.
            Value =/ ws Group
             Group = "(" *Value ws ")"
 Finally, class `Document` is replaced in order to allow standalone
 documents to directly comprise a sequence of multiple values.
          Document = *Value ws
 No changes to [the Preserves semantic model](preserves.html) are made.
 Every Preserves text-syntax term is a valid P-expression, but in general
 P-expressions must be rewritten or otherwise interpreted before a
 meaningful Preserves value can be arrived at.
 ## Encoding P-expressions as Preserves
 Aside from the special classes `Group`, `Block`, `Comma`, `Semicolon` or
 `Colons`, P-expressions are directly encodable as Preserves data. All
 members of the special classes are encoded as Preserves text
 `Dictionary`[^encoding-rationale] values:
 [^encoding-rationale]: In principle, it would be nice to use *records*
    for this purpose, but if we did so we would have to also encode
    usages of records!
 {:.pseudocode}
 > ⌜`(`*p* ...`)`⌝ ⟶ `{g:[`⌜*p*⌝ ...`]}`
 > ⌜`{`*p* ...`}`⌝ ⟶ `{b:[`⌜*p*⌝ ...`]}`
 > ⌜`,`⌝ ⟶ `{s:|,|}`
 > ⌜`;`⌝ ⟶ `{s:|;|}`
 > ⌜`:` ...⌝ ⟶ `{s:|:` ...`|}`
 ## Appendix: Examples
 Examples are given as pairs of P-expressions and their Preserves
 text-syntax encodings.
 ```preserves
 ⌜<date 1821 (lookup-month "February") 3>⌝
 = <date 1821 {g:[lookup-month "February"]} 3>
 ```
 ```preserves
 ⌜(begin (println! (+ 1 2)) (+ 3 4))⌝
 = {g:[begin {g:[println! {g:[+ 1 2]}]} {g:[+ 3 4]}]}
 ```
 ```preserves
 ⌜()⌝
 = {g:[]}
 ⌜[() () ()]⌝
 = [{g:[]}, {g:[]}, {g:[]}]
 ```
 ```preserves
 ⌜{
      setUp();
      # Now enter the loop
      loop: {
          greet("World");
      }
      tearDown();
  }⌝
 = {b:[
      setUp {g:[]} {s:|;|}
      # Now enter the loop
      loop {s:|:|} {b:[
          greet {g:["World"]} {s:|;|}
      ]}
      tearDown {g:[]} {s:|;|}
  ]}
 ```
 ```preserves
 ⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝
 = [1 + 2.0 {s:|,|} print "Hello" {s:|,|} predicate {s:|:|} #t {s:|,|}
   foo {s:|,|} #!remote {s:|,|} bar]
 ```
 ```preserves
 ⌜{
      optional name: string,
      address: Address,
  }⌝
 = {b:[
      optional name {s:|:|} string {s:|,|}
      address {s:|:|} Address {s:|,|}
  ]}
 ```
 ## Appendix: Using a P-expression reader to read Preserves
 A reader for P-expressions can be adapted to yield a reader for
 Preserves terms by processing (subterms of) each P-expression that the
 reader produces. The only subterms that need processing are the special
 classes mentioned above.
 1. Every `Group` or `Semicolon` that appears is an error.
 2. Every `Colons` with two or more colons in it is an error.
 3. Every `Comma` that appears is removed from its container.
 4. Every `Block` must contain triplets of `Value`, `Colons` (with a
    single colon), `Value`. Any `Block` not following this pattern is an
    error. Each `Block` following the pattern is translated to a
    `Dictionary` containing a key/value pair for each triplet.
 ## Appendix: Reading vs. Parsing
 Lisp systems first *read* streams of bytes into S-expressions and then
 *parse* those S-expressions into more abstract structures denoting
 various kinds of program syntax. [Separation of reading from parsing is
 what gives Lisp its syntactic
 flexibility.](http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-point/)
 Similarly, the Apple programming language
 [Dylan](https://en.wikipedia.org/wiki/Dylan_(programming_language))
 included a reader-parser split, with the Dylan reader producing
 *D-expressions* that are somewhat similar to P-expressions.
 Finally, the Racket dialects
 [Honu](https://docs.racket-lang.org/honu/index.html) and
 [Something](https://github.com/tonyg/racket-something) use a
 reader-parser-macro setup, where the reader produces Racket data, the
 parser produces "syntax" and is user-extensible, and Racket's own
 modular macro system rewrites this "syntax" down to core forms to be
 compiled to machine code.
 Similarly, when using P-expressions as the foundation for a language, a
 generic P-expression reader can then feed into special-purpose
 *parsers*. The reader captures the coarse syntactic structure of a
 program, and the parser refines this.
 Often, a parser will wish to extract structure from sequences of
 P-expression `Value`s.
 - A simple technique is repeated splitting of sequences; first by
   `Semicolon`, then by `Comma`, then by increasingly high binding-power
   operators.
 - More refined is to use a Pratt parser or similar
   ([1](https://en.wikipedia.org/wiki/Operator-precedence_parser),
   [2](https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html),
   [3](https://github.com/tonyg/racket-something/blob/f6116bf3861b76970f5ce291a628476adef820b4/src/something/pratt.rkt))
   to build a parse tree using an extensible specification of the pre-,
   in-, and postfix operators involved.
 - Finally, if you treat sequences of `Value`s as pre-lexed token
   streams, almost any parsing formalism (such as [PEG
   parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),
   [Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to
   extract further syntactic structure.
 ## Notes