preserves/preserves-expressions.md at 2a6d0912b681e35827a1ea69177661265e3d8407

12 KiB

Raw Blame History

title
P-expressions

Tony Garnock-Jones tonyg@leastfixedpoint.com
October 2023. Version 0.3.0.

Experimental. This document defines a grammar called Preserves Expressions (P-expressions, pexprs) that includes ordinary Preserves text syntax but offers extensions sufficient to support a Lisp- or Haskell-like programming notation.

Motivation. The text syntax for Preserves works well for writing Values, i.e. data. However, in some contexts, Preserves applications need a broader grammar that allows interleaving of expressions with data. Two examples are the Preserves Schema language and the Synit configuration scripting language, both of which (ab)use Preserves text syntax as a kind of programming notation.

Preliminaries

The P-expression grammar includes by reference the definition of Atom from the text syntax, as well as the definitions that Atom depends on.

**Whitespace.** Whitespace `ws` is, as in the text syntax, defined as any number of spaces, tabs, carriage returns, or line feeds.

            ws = *(%x20 / %x09 / CR / LF)

No changes to the Preserves semantic model are made. Every Preserves text-syntax term can be parsed as a valid P-expression, but in general P-expressions must be rewritten or otherwise interpreted before a meaningful Preserves value can be arrived at (see below).

Grammar

Standalone documents containing P-expressions are sequences of individual Exprs, followed by annotations, comments, and/or whitespace.

      Document = *Expr Trailer ws

A single P-expression Expr can be an Atom from the text syntax, a compound expression, special punctuation, an Embedded expression, or an Annotated expression. The class SimpleExpr includes all of Expr except special punctuation.

          Expr = ws (SimpleExpr / Punct)
    SimpleExpr = Atom / Compound / Embedded / Annotated

Embedded and annotated values are as in the text syntax, differing only in that uses of Value are replaced with SimpleExpr.

       Embedded = "#!" SimpleExpr
      Annotated = Annotation SimpleExpr
     Annotation = "@" SimpleExpr / "#" [(%x20 / %x09) linecomment] (CR / LF)
    linecomment = *<any unicode scalar value except CR or LF>

P-expression special punctuation marks are comma, semicolon, and sequences of one or more colons.¹

         Punct = "," / ";" / 1*":"

Compound expressions are sequences of Exprs with optional trailing Annotations, surrounded by various kinds of parentheses.

      Compound = Sequence / Record / Block / Group / Set
      Sequence =  "[" *Expr Trailer ws "]"
        Record =  "<" *Expr Trailer ws ">"
         Block =  "{" *Expr Trailer ws "}"
         Group =  "(" *Expr Trailer ws ")"
           Set = "#{" *Expr Trailer ws "}"

In an Annotated P-expression, annotations and comments attach to the term following them, just as in the ordinary text syntax. However, it is common in programming notations to allow comments at the end of a file or other sequential construct. The ordinary text syntax forbids comments in these positions, but P-expressions allow them.

       Trailer = *(ws Annotation)

Encoding P-expressions as Preserves

We write ⌜p⌝ for the encoding into Preserves of a P-expression p.

{:.pseudocode.equations} | ⌜·⌝ : Expr | ⟶ | Value | | ⌜[p ...]⌝ | = | [⌜p⌝ ...] | | ⌜<p ...>⌝ | = | <r ⌜p⌝ ...> | | ⌜{p ...}⌝ | = | <b ⌜p⌝ ...> | | ⌜(p ...)⌝ | = | <g ⌜p⌝ ...> | | ⌜#{p ...}⌝ | = | <s ⌜p⌝ ...> | | ⌜#!p⌝ | = | #!⌜p⌝ | | ⌜@p q⌝ | = | @⌜p⌝ ⌜q⌝ | | ⌜p⌝ | = | p | when p ∈ Atom | | ⌜,⌝ | = | <p |,|> | | ⌜;⌝ | = | <p |;|> | | ⌜: ...⌝ | = | <p |: ...|> | | ⌜t⌝ | = | @⌜a⌝ ... <a> | where @a ... are the annotations in t and t ∈ Trailer |

The record <a> acts as an anchor for the annotations in a Trailer.

We overload the ⌜·⌝ notation for encoding whole Documents into sequences of Preserves values.

{:.pseudocode.equations} | ⌜·⌝ : P-expression Document | ⟶ | Preserves Sequence | | ⌜p ...⌝ | = | [⌜p⌝ ...] | | ⌜p ... @a ...⌝ | = | [⌜p⌝ ... @⌜a⌝ ... <a>] | | | | where @a ... are trailing annotations |

Interpreting P-expressions as Preserves

The previous section discussed ways of representing P-expressions using Preserves. Here, we discuss interpreting P-expressions as Preserves so that (1) a Preserves datum (2) written using Preserves text syntax and then (3) read as a P-expression can be (4) interpreted from that P-expression to yield the original datum.

Every (..) or ; that appears is an error.
Every :, ::, :::, ... is an error, except in context of Blocks as described below.
Every , that appears is discarded.
Every Trailer that appears is an error.²
Every Record with no values in it is an error.
Every Block must contain zero or more repeating triplets of SimpleExpr, :, SimpleExpr. Any Block not following this pattern is an error. Each Block following the pattern is translated to a Dictionary containing a key/value pair for each triplet. Any Block with duplicate keys (under interpretation) is an error.
Every Set containing any duplicate expressions (under interpretation) is an error.

Appendix: Examples

Examples are given as pairs of P-expressions and their Preserves text-syntax encodings.

Individual P-expression `Expr`s

 ⌜<date 1821 (lookup-month "February") 3>⌝
= <r date 1821 <g lookup-month "February"> 3>

 ⌜<>⌝
= <r>

 ⌜(begin (println! (+ 1 2)) (+ 3 4))⌝
= <g begin <g println! <g + 1 2>> <g + 3 4>>

 ⌜()⌝
= <g>

 ⌜[() () ()]⌝
= [<g>, <g>, <g>]

 ⌜{
      setUp();
      # Now enter the loop
      loop: {
          greet("World");
      }
      tearDown();
  }⌝
= <b
      setUp <g> <p |;|>
      # Now enter the loop
      loop <p |:|> <b
          greet <g "World"> <p |;|>
      >
      tearDown <g> <p |;|>
  >

 ⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝
= [1 + 2.0 <p |,|> print "Hello" <p |,|> predicate <p |:|> #t <p |,|>
   foo <p |,|> #!remote <p |,|> bar]

 ⌜#{1 2 3}⌝
= <s 1 2 3>

 ⌜#{(read) (read) (read)}⌝
= <s <g read> <g read> <g read>>

 ⌜{
      optional name: string,
      address: Address,
  }⌝
= <b
      optional name <p |:|> string <p |,|>
      address <p |:|> Address <p |,|>
  >

Whole `Document`s

 ⌜{
      key: value
      # example of a comment at the end of a dictionary
  }
  # example of a comment at the end of the input file⌝
= [ <b
        key <p |:|> value
        @"example of a comment at the end of a dictionary" <a>
    >
    @"example of a comment at the end of the input file"
    <a>
  ]

Appendix: Reading vs. Parsing

Lisp systems first read streams of bytes into S-expressions and then parse those S-expressions into more abstract structures denoting various kinds of program syntax. Separation of reading from parsing is what gives Lisp its syntactic flexibility.

Similarly, the Apple programming language Dylan included a reader-parser split, with the Dylan reader producing D-expressions that are somewhat similar to P-expressions.

Finally, the Racket dialects Honu and Something use a reader-parser-macro setup, where the reader produces Racket data, the parser produces "syntax" and is user-extensible, and Racket's own modular macro system rewrites this "syntax" down to core forms to be compiled to machine code.

Similarly, when using P-expressions as the foundation for a language, a generic P-expression reader can then feed into special-purpose parsers. The reader captures the coarse syntactic structure of a program, and the parser refines this.

Often, a parser will wish to extract structure from sequences of P-expression Exprs.

A simple technique is repeated splitting of sequences; first by Semicolon, then by Comma, then by increasingly high binding-power operators.
More refined is to use a Pratt parser or similar (1, 2, 3) to build a parse tree using an extensible specification of the pre-, in-, and postfix operators involved.
Finally, if you treat sequences of Exprs as pre-lexed token streams, almost any parsing formalism (such as PEG parsing, Ometa, etc.) can be used to extract further syntactic structure.

Appendix: Equations for interpreting P-expressions as Preserves

The function uncomma(p) removes all occurrences of , from a P-expression p ∈ Expr − {,}.

{:.pseudocode.equations} | uncomma : Expr − {,} | ⟶ | Expr | | | uncomma([p ...]) | = | [uncomma(p) ...] | omitting any p = , | | uncomma(<p ...>) | = | <uncomma(p) ...> | omitting any p = , | | uncomma({p ...}) | = | {uncomma(p) ...} | omitting any p = , | | uncomma((p ...)) | = | (uncomma(p) ...) | omitting any p = , | | uncomma(#{p ...}) | = | #{uncomma(p) ...} | omitting any p = , | | uncomma(#!p) | = | #!uncomma(p) | | | uncomma(@p q) | = | @uncomma(p) uncomma(q) | | | uncomma(p) | = | p | if p ∈ Atom ∪ Punct − {,} |

We write ⌞uncomma(p)⌟ for the partial function mapping a P-expression p ∈ Expr − {,} to a corresponding Preserves Value.

{:.pseudocode.equations} | ⌞·⌟ : Expr − {,} | ⇀ | Value | | | ⌞[p ...]⌟ | = | [⌞p⌟ ...] | | | ⌞<ℓ p ...>⌟ | = | <⌞ℓ⌟ ⌞p⌟ ...> | | | ⌞{k:v ...}⌟ | = | {⌞k⌟:⌞v⌟ ...} | if all ⌞k⌟ ... are distinct | | ⌞#{p ...}⌟ | = | #{⌞p⌟ ...} | if all ⌞p⌟ ... are distinct | | ⌞#!p⌟ | = | #!⌞p⌟ | | | ⌞@p q⌟ | = | @⌞p⌟ ⌞q⌟ | | | ⌞p⌟ | = | p | when p ∈ Atom |

Notes

Colon matching is greedy: when reading, all adjacent colons are always taken into a single token, and when writing, adjacent colon-sequence punctuation marks must be written with whitespace separating them. ↩︎
Implementation note. When implementing parsing of P-expressions into Preserves, consider offering an optional mode where trailing annotations Trailer are discarded instead of causing an error to be signalled. ↩︎

12 KiB Raw Blame History Unescape Escape