preserves/preserves-expressions.md

7.0 KiB

title
P-expressions

Tony Garnock-Jones tonyg@leastfixedpoint.com
October 2023. Version 0.1.0.

This document defines a grammar called Preserves Expressions (P-expressions, pexprs) that includes ordinary Preserves text syntax but offers extensions sufficient to support a Lisp- or Haskell-like programming notation.

Motivation. The text syntax for Preserves works well for writing Values, i.e. data. However, in some contexts, Preserves applications need a broader grammar that allows interleaving of expressions with data. Two examples are the Preserves Schema language and the Synit configuration scripting language, both of which (ab)use Preserves text syntax as a kind of programming notation.

Preliminaries

The P-expression grammar takes the text syntax grammar as its base and modifies it.

**Whitespace.** Whitespace is redefined as any number of spaces, tabs, carriage returns, or line feeds. Commas are *not* considered whitespace in P-expressions.
            ws = *(%x20 / %x09 / CR / LF)

Delimiters. Because commas are no longer included in class ws, class delimiter is widened to include them explicitly.

     delimiter = ws / ","
               / "<" / ">" / "[" / "]" / "{" / "}"
               / "#" / ":" / DQUOTE / "|" / "@" / ";"

Grammar

P-expressions add comma, semicolon, and sequences of one or more colons to the syntax class Value.

        Value =/ Comma / Semicolon / Colons
         Comma = ","
     Semicolon = ";"
        Colons = 1*":"

Now that colon is in Value, the syntax for Dictionary is replaced with Block everywhere it is mentioned.

         Block =  "{" *Value ws "}"

New syntax for explicit uninterpreted grouping of sequences of values is introduced, and added to class Value.

        Value =/ ws Group
         Group = "(" *Value ws ")"

Finally, class Document is replaced in order to allow standalone documents to directly comprise a sequence of multiple values.

      Document = *Value ws

No changes to the Preserves semantic model are made. Every Preserves text-syntax term is a valid P-expression, but in general P-expressions must be rewritten or otherwise interpreted before a meaningful Preserves value can be arrived at.

Encoding P-expressions as Preserves

Aside from the special classes Group, Block, Comma, Semicolon or Colons, P-expressions are directly encodable as Preserves data. All members of the special classes are encoded as Preserves text Dictionary1 values:

{:.pseudocode}

(p ...)⌝ ⟶ {g:[p⌝ ...]}{p ...}⌝ ⟶ {b:[p⌝ ...]},⌝ ⟶ {s:|,|};⌝ ⟶ {s:|;|}: ...⌝ ⟶ {s:|: ...|}

Appendix: Examples

Examples are given as pairs of P-expressions and their Preserves text-syntax encodings.

 ⌜<date 1821 (lookup-month "February") 3>⌝
= <date 1821 {g:[lookup-month "February"]} 3>
 ⌜(begin (println! (+ 1 2)) (+ 3 4))⌝
= {g:[begin {g:[println! {g:[+ 1 2]}]} {g:[+ 3 4]}]}
 ⌜()⌝
= {g:[]}

 ⌜[() () ()]⌝
= [{g:[]}, {g:[]}, {g:[]}]
 ⌜{
      setUp();
      # Now enter the loop
      loop: {
          greet("World");
      }
      tearDown();
  }⌝
= {b:[
      setUp {g:[]} {s:|;|}
      # Now enter the loop
      loop {s:|:|} {b:[
          greet {g:["World"]} {s:|;|}
      ]}
      tearDown {g:[]} {s:|;|}
  ]}
 ⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝
= [1 + 2.0 {s:|,|} print "Hello" {s:|,|} predicate {s:|:|} #t {s:|,|}
   foo {s:|,|} #!remote {s:|,|} bar]
 ⌜{
      optional name: string,
      address: Address,
  }⌝
= {b:[
      optional name {s:|:|} string {s:|,|}
      address {s:|:|} Address {s:|,|}
  ]}

Appendix: Using a P-expression reader to read Preserves

A reader for P-expressions can be adapted to yield a reader for Preserves terms by processing (subterms of) each P-expression that the reader produces. The only subterms that need processing are the special classes mentioned above.

  1. Every Group or Semicolon that appears is an error.
  2. Every Colons with two or more colons in it is an error.
  3. Every Comma that appears is removed from its container.
  4. Every Block must contain triplets of Value, Colons (with a single colon), Value. Any Block not following this pattern is an error. Each Block following the pattern is translated to a Dictionary containing a key/value pair for each triplet.

Appendix: Reading vs. Parsing

Lisp systems first read streams of bytes into S-expressions and then parse those S-expressions into more abstract structures denoting various kinds of program syntax. Separation of reading from parsing is what gives Lisp its syntactic flexibility.

Similarly, the Apple programming language Dylan included a reader-parser split, with the Dylan reader producing D-expressions that are somewhat similar to P-expressions.

Finally, the Racket dialects Honu and Something use a reader-parser-macro setup, where the reader produces Racket data, the parser produces "syntax" and is user-extensible, and Racket's own modular macro system rewrites this "syntax" down to core forms to be compiled to machine code.

Similarly, when using P-expressions as the foundation for a language, a generic P-expression reader can then feed into special-purpose parsers. The reader captures the coarse syntactic structure of a program, and the parser refines this.

Often, a parser will wish to extract structure from sequences of P-expression Values.

  • A simple technique is repeated splitting of sequences; first by Semicolon, then by Comma, then by increasingly high binding-power operators.

  • More refined is to use a Pratt parser or similar (1, 2, 3) to build a parse tree using an extensible specification of the pre-, in-, and postfix operators involved.

  • Finally, if you treat sequences of Values as pre-lexed token streams, almost any parsing formalism (such as PEG parsing, Ometa, etc.) can be used to extract further syntactic structure.

Notes


  1. In principle, it would be nice to use records for this purpose, but if we did so we would have to also encode usages of records! ↩︎