This is way better

This commit is contained in:
Tony Garnock-Jones 2023-11-01 13:06:15 +01:00
parent 1a0772d39f
commit d7b983e140
2 changed files with 145 additions and 108 deletions

View File

@ -0,0 +1,22 @@
The definition of `Atom` is as given in the Preserves text syntax.
```text
Document := Expr* sp
Expr := sp (Atom | Compound | Punct | Embedded | Annotated)
Compound := Sequence | Record | Block | Group | Set
Punct := `,` | `;` | `:`+
sp := (space | tab | cr | lf)*
Sequence := `[` Expr* Trailer sp `]`
Record := `<` Expr* Trailer sp `>`
Block := `{` Expr* Trailer sp `}`
Group := `(` Expr* Trailer sp `)`
Set := `#{` Expr* Trailer sp `}`
Trailer := Annotation*
Embedded := `#!` Expr
Annotated := Annotation Expr
Annotation := `@` Expr | `#` ((space | tab) linecomment) (cr | lf)
```

View File

@ -3,128 +3,105 @@ title: "P-expressions"
--- ---
Tony Garnock-Jones <tonyg@leastfixedpoint.com> Tony Garnock-Jones <tonyg@leastfixedpoint.com>
October 2023. Version 0.2.0. October 2023. Version 0.3.0.
[text syntax]: preserves-text.html
This document defines a grammar called *Preserves Expressions* This document defines a grammar called *Preserves Expressions*
(*P-expressions*, *pexprs*) that includes [ordinary Preserves text (*P-expressions*, *pexprs*) that includes [ordinary Preserves text
syntax](preserves-text.html) but offers extensions sufficient to support syntax][text syntax] but offers extensions sufficient to support a Lisp-
a Lisp- or Haskell-like programming notation. or Haskell-like programming notation.
**Motivation.** The [text syntax](preserves-text.html) for Preserves **Motivation.** The [text syntax][] for Preserves works well for writing
works well for writing `Value`s, i.e. data. However, in some contexts, `Value`s, i.e. data. However, in some contexts, Preserves applications
Preserves applications need a broader grammar that allows interleaving need a broader grammar that allows interleaving of *expressions* with
of *expressions* with data. Two examples are the [Preserves Schema data. Two examples are the [Preserves Schema
language](preserves-schema.html) and the [Synit configuration scripting language](preserves-schema.html) and the [Synit configuration scripting
language](https://synit.org/book/operation/scripting.html), both of language](https://synit.org/book/operation/scripting.html), both of
which (ab)use Preserves text syntax as a kind of programming notation. which (ab)use Preserves text syntax as a kind of programming notation.
## Preliminaries ## Preliminaries
The P-expression grammar takes the text syntax grammar as its base and The P-expression grammar includes by reference the definition of `Atom` from the
modifies it. [text syntax][], as well as the definitions that `Atom` depends on.
P-expressions take their own approach to inter-token whitespace,
however.
<a id="whitespace"> <a id="whitespace">
**Whitespace.** Whitespace is redefined as any number of spaces, tabs, **Whitespace.** Whitespace `sp` is defined as any number of spaces,
carriage returns, or line feeds. Commas are *not* considered whitespace tabs, carriage returns, or line feeds. Commas are *not* considered
in P-expressions. whitespace in P-expressions, and so class `sp` is different to class
`ws` from the text syntax.
ws = *(%x20 / %x09 / CR / LF) sp = *(%x20 / %x09 / CR / LF)
<a id="delimiters"></a> No changes to [the Preserves semantic model](preserves.html) are made.
**Delimiters.** Because commas are no longer included in class `ws`, Every Preserves text-syntax term can be parsed as a valid P-expression,
class `delimiter` is widened to include them explicitly. but in general P-expressions must be rewritten or otherwise interpreted
before a meaningful Preserves value can be arrived at ([see
delimiter = ws / "," below](#reading-preserves)).
/ "<" / ">" / "[" / "]" / "{" / "}"
/ "#" / ":" / DQUOTE / "|" / "@" / ";"
## Grammar ## Grammar
P-expressions add comma, semicolon, and sequences of one or more colons Standalone documents containing P-expressions are sequences of
to the syntax class `Value`. individual `Expr`s, followed by trailing whitespace.
Value =/ Comma / Semicolon / Colons Document = *Expr sp
Comma = ","
Semicolon = ";"
Colons = 1*":"
Now that colon is in `Value`, the syntax for `Dictionary` is replaced A single P-expression `Expr` can be an `Atom` from the [text syntax][],
with `Block` everywhere it is mentioned. a compound expression, special punctuation, an `Embedded` expression, or
an `Annotated` expression.
Block = "{" *Value ws "}" Expr = sp (Atom | Compound | Punct | Embedded | Annotated)
Syntax for `Record` is loosened to allow empty angle brackets. Embedded and annotated values are as in the text syntax, differing only
in that uses of `Value` are replaced with `Expr`.
Record = "<" *Value ws ">" Embedded = "#!" Expr
Annotated = Annotation Expr
Annotation = "@" Expr / "#" [(%x20 / %x09) linecomment] (CR / LF)
New syntax for explicit uninterpreted grouping of sequences of values is P-expression special punctuation marks are comma, semicolon, and sequences of one or more colons.
introduced, and added to class `Value`.
Value =/ ws Group Punct = "," / ";" / 1*":"
Group = "(" *Value ws ")"
Finally, class `Document` is replaced in order to allow standalone Compound expressions are sequences of `Expr`s with optional trailing
documents to directly comprise a sequence of multiple values. `Annotation`s, surrounded by various kinds of parentheses.
Document = *Value ws Compound = Sequence / Record / Block / Group / Set
Sequence = "[" *Expr Trailer sp "]"
Record = "<" *Expr Trailer sp ">"
Block = "{" *Expr Trailer sp "}"
Group = "(" *Expr Trailer sp ")"
Set = "#{" *Expr Trailer sp "}"
No changes to [the Preserves semantic model](preserves.html) are made. In an `Annotated` P-expression, annotations and comments attach to the
Every Preserves text-syntax term is a valid P-expression, but in general term following them, just as in the ordinary text syntax. However, it is
P-expressions must be rewritten or otherwise interpreted before a common in programming notations to allow comments at the end of a file
meaningful Preserves value can be arrived at ([see or other sequential construct. The ordinary text syntax forbids comments
below](#reading-preserves)). in these positions, but P-expressions allow them.
## <a id="annotations"></a>Annotations and Comments Trailer = *Annotation
Annotations and comments attach to the term following them, just as in
the ordinary text syntax. However, it is common in programming notations
to allow comments at the end of a file or other sequential construct:
{
key: value
# example of a comment at the end of a dictionary
}
# example of a comment at the end of the input file
While the ordinary text syntax forbids comments in these positions,
P-expressions allow them:
Document =/ *Value Trailer ws
Record =/ "<" *Value Trailer ws ">"
Sequence =/ "[" *Value Trailer ws "]"
Set =/ "#{" *Value Trailer ws "}"
Block =/ "{" *Value Trailer ws "}"
Group =/ "(" *Value Trailer ws ")"
Trailer = 1*Annotation
## <a id="encoding-pexprs"></a>Encoding P-expressions as Preserves ## <a id="encoding-pexprs"></a>Encoding P-expressions as Preserves
We write ⌜*p*⌝ for the encoding into Preserves of P-expression *p*. We write ⌜*p*⌝ for the encoding into Preserves of P-expression *p*.
{:.pseudocode.equations} {:.pseudocode.equations}
| ⌜·⌝ : **P-expression** | ⟶ | **Preserves** | | ⌜·⌝ : **Expr** | ⟶ | **Value** |
| ⌜`[`*p* ...`]`⌝ | = | `[`⌜*p*⌝ ...`]` |
Aside from `Group`, `Block`, `Comma`, `Semicolon`, `Colons`, `Trailer`, | ⌜`<`*p* ...`>`⌝ | = | `<r` ⌜*p*⌝ ...`>` |
and `Record`, P-expressions are encoded directly as Preserves data. | ⌜`{`*p* ...`}`⌝ | = | `<b` ⌜*p*⌝ ...`>` |
| ⌜`(`*p* ...`)`⌝ | = | `<g` ⌜*p*⌝ ...`>` |
{:.pseudocode.equations} | ⌜`#{`*p* ...`}`⌝ | = | `<s `⌜*p*⌝ ...`>` |
| ⌜`[`*p* ...`]`⌝ | = | `[`⌜*p*⌝ ...`]` | | ⌜`#!`*p*⌝ | = | `#!`⌜*p*⌝ |
| ⌜`#{`*p* ...`}`⌝ | = | `#{`⌜*p*⌝ ...`}` | | ⌜`@`*p* *q*⌝ | = | `@`⌜*p*⌝ ⌜*q*⌝ |
| ⌜`#!`*p*⌝ | = | `#!`⌜*p*⌝ | | ⌜*p*⌝ | = | *p* | when *p***Atom** |
| ⌜`@`*p* *q*⌝ | = | `@`⌜*p*⌝ ⌜*q*⌝ | | ⌜`,`⌝ | = | `<p |,|>` |
| ⌜*p*⌝ | = | *p* when *p***Atom** | | ⌜`;`⌝ | = | `<p |;|>` |
| ⌜`:` ...⌝ | = | `<p |:` ...`|>` |
Everything else is encoded as Preserves records. | ⌜*t*⌝ | = | ⌜*a*⌝ ... `<a>` | where *a* ... are the annotations in *t* and *t***Trailer** |
{:.pseudocode.equations}
| ⌜`<`*p* ...`>`⌝ | = | `<r` ⌜*p*⌝ ...`>` |
| ⌜`(`*p* ...`)`⌝ | = | `<g` ⌜*p*⌝ ...`>` |
| ⌜`{`*p* ...`}`⌝ | = | `<b` ⌜*p*⌝ ...`>` |
| ⌜`,`⌝ | = | `<s |,|>` |
| ⌜`;`⌝ | = | `<s |;|>` |
| ⌜`:` ...⌝ | = | `<s |:` ...`|>` |
| ⌜*t*⌝ | = | ⌜*a*⌝ ... `<a>`, where *a* ... are the annotations in *t* and *t***Trailer** |
The record `<a>` acts as an anchor for the annotations in a `Trailer`. The record `<a>` acts as an anchor for the annotations in a `Trailer`.
@ -145,18 +122,19 @@ using Preserves text syntax and then (3) read as a P-expression can be
A reader for P-expressions can be adapted to yield a reader for A reader for P-expressions can be adapted to yield a reader for
Preserves terms by processing (subterms of) each P-expression that the Preserves terms by processing (subterms of) each P-expression that the
reader produces. The only subterms that need processing are the special reader produces.
classes mentioned above.
1. Every `Group` or `Semicolon` that appears is an error. 1. Every `(`..`)` or `;` that appears is an error.
2. Every `Colons` with two or more colons in it is an error. 2. Every `:`, `::`, `:::`, ... is an error, except in context of `Block`s as described below.
3. Every `Comma` that appears is discarded. 3. Every `,` that appears is discarded.
4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error] 4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error]
5. Every `Record` with no values in it is an error. 5. Every `Record` with no values in it is an error.
6. Every `Block` must contain triplets of `Value`, `Colons` (with a 6. Every `Block` must contain zero or more repeating triplets of
single colon), `Value`. Any `Block` not following this pattern is an `Expr`, `:`, `Expr`. Any `Block` not following this pattern is an
error. Each `Block` following the pattern is translated to a error. Each `Block` following the pattern is translated to a
`Dictionary` containing a key/value pair for each triplet. `Dictionary` containing a key/value pair for each triplet. Any
`Block` with duplicate keys (under interpretation) is an error.
7. Every `Set` containing any duplicate expressions (under interpretation) is an error.
[^discard-trailers-instead-of-error]: **Implementation note.** When [^discard-trailers-instead-of-error]: **Implementation note.** When
implementing parsing of P-expressions into Preserves, consider implementing parsing of P-expressions into Preserves, consider
@ -168,7 +146,7 @@ classes mentioned above.
Examples are given as pairs of P-expressions and their Preserves Examples are given as pairs of P-expressions and their Preserves
text-syntax encodings. text-syntax encodings.
### Individual P-expression `Value`s ### Individual P-expression `Expr`s
```preserves ```preserves
<date 1821 (lookup-month "February") 3> <date 1821 (lookup-month "February") 3>
@ -203,19 +181,27 @@ text-syntax encodings.
tearDown(); tearDown();
}⌝ }⌝
= <b = <b
setUp <g> <s |;|> setUp <g> <p |;|>
# Now enter the loop # Now enter the loop
loop <s |:|> <b loop <p |:|> <b
greet <g "World"> <s |;|> greet <g "World"> <p |;|>
> >
tearDown <g> <s |;|> tearDown <g> <p |;|>
> >
``` ```
```preserves ```preserves
⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝ ⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝
= [1 + 2.0 <s |,|> print "Hello" <s |,|> predicate <s |:|> #t <s |,|> = [1 + 2.0 <p |,|> print "Hello" <p |,|> predicate <p |:|> #t <p |,|>
foo <s |,|> #!remote <s |,|> bar] foo <p |,|> #!remote <p |,|> bar]
```
```preserves
⌜#{1 2 3}⌝
= <s 1 2 3>
⌜#{(read) (read) (read)}⌝
= <s <g read> <g read> <g read>>
``` ```
```preserves ```preserves
@ -224,8 +210,8 @@ text-syntax encodings.
address: Address, address: Address,
}⌝ }⌝
= <b = <b
optional name <s |:|> string <s |,|> optional name <p |:|> string <p |,|>
address <s |:|> Address <s |,|> address <p |:|> Address <p |,|>
> >
``` ```
@ -238,7 +224,7 @@ text-syntax encodings.
} }
# example of a comment at the end of the input file⌝ # example of a comment at the end of the input file⌝
= [ <b = [ <b
key <s |:|> value key <p |:|> value
@"example of a comment at the end of a dictionary" <a> @"example of a comment at the end of a dictionary" <a>
> >
@"example of a comment at the end of the input file" @"example of a comment at the end of the input file"
@ -273,7 +259,7 @@ generic P-expression reader can then feed into special-purpose
program, and the parser refines this. program, and the parser refines this.
Often, a parser will wish to extract structure from sequences of Often, a parser will wish to extract structure from sequences of
P-expression `Value`s. P-expression `Expr`s.
- A simple technique is repeated splitting of sequences; first by - A simple technique is repeated splitting of sequences; first by
`Semicolon`, then by `Comma`, then by increasingly high binding-power `Semicolon`, then by `Comma`, then by increasingly high binding-power
@ -286,10 +272,39 @@ P-expression `Value`s.
to build a parse tree using an extensible specification of the pre-, to build a parse tree using an extensible specification of the pre-,
in-, and postfix operators involved. in-, and postfix operators involved.
- Finally, if you treat sequences of `Value`s as pre-lexed token - Finally, if you treat sequences of `Expr`s as pre-lexed token
streams, almost any parsing formalism (such as [PEG streams, almost any parsing formalism (such as [PEG
parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar), parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),
[Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to [Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to
extract further syntactic structure. extract further syntactic structure.
## Appendix: Equations for interpreting P-expressions as Preserves
The partial function **uncomma**(*p*) removes all occurrences of `,`
from a P-expression *p*.
{:.pseudocode.equations}
| **uncomma** : **Expr** | ⇀ | **Expr** | |
| **uncomma**(`[`*p* ...`]`) | = | `[`**uncomma**(*p*) ...`]` | omitting any *p* = `,` |
| **uncomma**(`<`*p* ...`>`) | = | `<`**uncomma**(*p*) ...`>` | omitting any *p* = `,` |
| **uncomma**(`{`*p* ...`}`) | = | `{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
| **uncomma**(`(`*p* ...`)`) | = | `(`**uncomma**(*p*) ...`)` | omitting any *p* = `,` |
| **uncomma**(`#{`*p* ...`}`) | = | `#{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
| **uncomma**(`#!`*p*) | = | `#!`**uncomma**(*p*) ...`}` | |
| **uncomma**(`@`*p* *q*) | = | `@`**uncomma**(*p*) **uncomma**(*q*) | |
| **uncomma**(*p*) | = | *p* | if *p***Atom** **Punct** - {`,`} |
We write ⌞**uncomma**(*p*)⌟ for the partial function mapping a
P-expression *p*`Expr` to a corresponding Preserves `Value`.
{:.pseudocode.equations}
| ⌞·⌟ : **Expr** | ⇀ | **Value** | |
| ⌞`[`*p* ...`]`⌟ | = | `[`⌞*p*⌟ ...`]` | |
| ⌞`<` *p* ...`>`⌟ | = | `<`⌞ℓ⌟ ⌞*p*⌟ ...`>` | |
| ⌞`{`*k*`:`*v* ...`}`⌟ | = | `{`⌞*k*⌟`:`⌞*v*⌟ ...`}` | if all ⌞*k*⌟ ... are distinct |
| ⌞`#{`*p* ...`}`⌟ | = | `#{`⌞*p*⌟ ...`}` | if all ⌞*p*⌟ ... are distinct |
| ⌞`#!`*p*⌟ | = | `#!`⌞*p*⌟ | |
| ⌞`@`*p* *q*⌟ | = | `@`⌞*p*⌟ ⌞*q*⌟ | |
| ⌞*p*⌟ | = | *p* | when *p***Atom** |
## Notes ## Notes