This is way better

This commit is contained in:
Tony Garnock-Jones 2023-11-01 13:06:15 +01:00
parent 1a0772d39f
commit d7b983e140
2 changed files with 145 additions and 108 deletions

View File

@ -0,0 +1,22 @@
The definition of `Atom` is as given in the Preserves text syntax.
```text
Document := Expr* sp
Expr := sp (Atom | Compound | Punct | Embedded | Annotated)
Compound := Sequence | Record | Block | Group | Set
Punct := `,` | `;` | `:`+
sp := (space | tab | cr | lf)*
Sequence := `[` Expr* Trailer sp `]`
Record := `<` Expr* Trailer sp `>`
Block := `{` Expr* Trailer sp `}`
Group := `(` Expr* Trailer sp `)`
Set := `#{` Expr* Trailer sp `}`
Trailer := Annotation*
Embedded := `#!` Expr
Annotated := Annotation Expr
Annotation := `@` Expr | `#` ((space | tab) linecomment) (cr | lf)
```

View File

@ -3,128 +3,105 @@ title: "P-expressions"
---
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
October 2023. Version 0.2.0.
October 2023. Version 0.3.0.
[text syntax]: preserves-text.html
This document defines a grammar called *Preserves Expressions*
(*P-expressions*, *pexprs*) that includes [ordinary Preserves text
syntax](preserves-text.html) but offers extensions sufficient to support
a Lisp- or Haskell-like programming notation.
syntax][text syntax] but offers extensions sufficient to support a Lisp-
or Haskell-like programming notation.
**Motivation.** The [text syntax](preserves-text.html) for Preserves
works well for writing `Value`s, i.e. data. However, in some contexts,
Preserves applications need a broader grammar that allows interleaving
of *expressions* with data. Two examples are the [Preserves Schema
**Motivation.** The [text syntax][] for Preserves works well for writing
`Value`s, i.e. data. However, in some contexts, Preserves applications
need a broader grammar that allows interleaving of *expressions* with
data. Two examples are the [Preserves Schema
language](preserves-schema.html) and the [Synit configuration scripting
language](https://synit.org/book/operation/scripting.html), both of
which (ab)use Preserves text syntax as a kind of programming notation.
## Preliminaries
The P-expression grammar takes the text syntax grammar as its base and
modifies it.
The P-expression grammar includes by reference the definition of `Atom` from the
[text syntax][], as well as the definitions that `Atom` depends on.
P-expressions take their own approach to inter-token whitespace,
however.
<a id="whitespace">
**Whitespace.** Whitespace is redefined as any number of spaces, tabs,
carriage returns, or line feeds. Commas are *not* considered whitespace
in P-expressions.
**Whitespace.** Whitespace `sp` is defined as any number of spaces,
tabs, carriage returns, or line feeds. Commas are *not* considered
whitespace in P-expressions, and so class `sp` is different to class
`ws` from the text syntax.
ws = *(%x20 / %x09 / CR / LF)
sp = *(%x20 / %x09 / CR / LF)
<a id="delimiters"></a>
**Delimiters.** Because commas are no longer included in class `ws`,
class `delimiter` is widened to include them explicitly.
delimiter = ws / ","
/ "<" / ">" / "[" / "]" / "{" / "}"
/ "#" / ":" / DQUOTE / "|" / "@" / ";"
No changes to [the Preserves semantic model](preserves.html) are made.
Every Preserves text-syntax term can be parsed as a valid P-expression,
but in general P-expressions must be rewritten or otherwise interpreted
before a meaningful Preserves value can be arrived at ([see
below](#reading-preserves)).
## Grammar
P-expressions add comma, semicolon, and sequences of one or more colons
to the syntax class `Value`.
Standalone documents containing P-expressions are sequences of
individual `Expr`s, followed by trailing whitespace.
Value =/ Comma / Semicolon / Colons
Comma = ","
Semicolon = ";"
Colons = 1*":"
Document = *Expr sp
Now that colon is in `Value`, the syntax for `Dictionary` is replaced
with `Block` everywhere it is mentioned.
A single P-expression `Expr` can be an `Atom` from the [text syntax][],
a compound expression, special punctuation, an `Embedded` expression, or
an `Annotated` expression.
Block = "{" *Value ws "}"
Expr = sp (Atom | Compound | Punct | Embedded | Annotated)
Syntax for `Record` is loosened to allow empty angle brackets.
Embedded and annotated values are as in the text syntax, differing only
in that uses of `Value` are replaced with `Expr`.
Record = "<" *Value ws ">"
Embedded = "#!" Expr
Annotated = Annotation Expr
Annotation = "@" Expr / "#" [(%x20 / %x09) linecomment] (CR / LF)
New syntax for explicit uninterpreted grouping of sequences of values is
introduced, and added to class `Value`.
P-expression special punctuation marks are comma, semicolon, and sequences of one or more colons.
Value =/ ws Group
Group = "(" *Value ws ")"
Punct = "," / ";" / 1*":"
Finally, class `Document` is replaced in order to allow standalone
documents to directly comprise a sequence of multiple values.
Compound expressions are sequences of `Expr`s with optional trailing
`Annotation`s, surrounded by various kinds of parentheses.
Document = *Value ws
Compound = Sequence / Record / Block / Group / Set
Sequence = "[" *Expr Trailer sp "]"
Record = "<" *Expr Trailer sp ">"
Block = "{" *Expr Trailer sp "}"
Group = "(" *Expr Trailer sp ")"
Set = "#{" *Expr Trailer sp "}"
No changes to [the Preserves semantic model](preserves.html) are made.
Every Preserves text-syntax term is a valid P-expression, but in general
P-expressions must be rewritten or otherwise interpreted before a
meaningful Preserves value can be arrived at ([see
below](#reading-preserves)).
In an `Annotated` P-expression, annotations and comments attach to the
term following them, just as in the ordinary text syntax. However, it is
common in programming notations to allow comments at the end of a file
or other sequential construct. The ordinary text syntax forbids comments
in these positions, but P-expressions allow them.
## <a id="annotations"></a>Annotations and Comments
Annotations and comments attach to the term following them, just as in
the ordinary text syntax. However, it is common in programming notations
to allow comments at the end of a file or other sequential construct:
{
key: value
# example of a comment at the end of a dictionary
}
# example of a comment at the end of the input file
While the ordinary text syntax forbids comments in these positions,
P-expressions allow them:
Document =/ *Value Trailer ws
Record =/ "<" *Value Trailer ws ">"
Sequence =/ "[" *Value Trailer ws "]"
Set =/ "#{" *Value Trailer ws "}"
Block =/ "{" *Value Trailer ws "}"
Group =/ "(" *Value Trailer ws ")"
Trailer = 1*Annotation
Trailer = *Annotation
## <a id="encoding-pexprs"></a>Encoding P-expressions as Preserves
We write ⌜*p*⌝ for the encoding into Preserves of P-expression *p*.
{:.pseudocode.equations}
| ⌜·⌝ : **P-expression** | ⟶ | **Preserves** |
Aside from `Group`, `Block`, `Comma`, `Semicolon`, `Colons`, `Trailer`,
and `Record`, P-expressions are encoded directly as Preserves data.
{:.pseudocode.equations}
| ⌜`[`*p* ...`]`⌝ | = | `[`⌜*p*⌝ ...`]` |
| ⌜`#{`*p* ...`}`⌝ | = | `#{`⌜*p*⌝ ...`}` |
| ⌜`#!`*p*⌝ | = | `#!`⌜*p*⌝ |
| ⌜`@`*p* *q*⌝ | = | `@`⌜*p*⌝ ⌜*q*⌝ |
| ⌜*p*⌝ | = | *p* when *p***Atom** |
Everything else is encoded as Preserves records.
{:.pseudocode.equations}
| ⌜`<`*p* ...`>`⌝ | = | `<r` ⌜*p*⌝ ...`>` |
| ⌜`(`*p* ...`)`⌝ | = | `<g` ⌜*p*⌝ ...`>` |
| ⌜`{`*p* ...`}`⌝ | = | `<b` ⌜*p*⌝ ...`>` |
| ⌜`,`⌝ | = | `<s |,|>` |
| ⌜`;`⌝ | = | `<s |;|>` |
| ⌜`:` ...⌝ | = | `<s |:` ...`|>` |
| ⌜*t*⌝ | = | ⌜*a*⌝ ... `<a>`, where *a* ... are the annotations in *t* and *t***Trailer** |
| ⌜·⌝ : **Expr** | ⟶ | **Value** |
| ⌜`[`*p* ...`]`⌝ | = | `[`⌜*p*⌝ ...`]` |
| ⌜`<`*p* ...`>`⌝ | = | `<r` ⌜*p*⌝ ...`>` |
| ⌜`{`*p* ...`}`⌝ | = | `<b` ⌜*p*⌝ ...`>` |
| ⌜`(`*p* ...`)`⌝ | = | `<g` ⌜*p*⌝ ...`>` |
| ⌜`#{`*p* ...`}`⌝ | = | `<s `⌜*p*⌝ ...`>` |
| ⌜`#!`*p*⌝ | = | `#!`⌜*p*⌝ |
| ⌜`@`*p* *q*⌝ | = | `@`⌜*p*⌝ ⌜*q*⌝ |
| ⌜*p*⌝ | = | *p* | when *p***Atom** |
| ⌜`,`⌝ | = | `<p |,|>` |
| ⌜`;`⌝ | = | `<p |;|>` |
| ⌜`:` ...⌝ | = | `<p |:` ...`|>` |
| ⌜*t*⌝ | = | ⌜*a*⌝ ... `<a>` | where *a* ... are the annotations in *t* and *t***Trailer** |
The record `<a>` acts as an anchor for the annotations in a `Trailer`.
@ -145,18 +122,19 @@ using Preserves text syntax and then (3) read as a P-expression can be
A reader for P-expressions can be adapted to yield a reader for
Preserves terms by processing (subterms of) each P-expression that the
reader produces. The only subterms that need processing are the special
classes mentioned above.
reader produces.
1. Every `Group` or `Semicolon` that appears is an error.
2. Every `Colons` with two or more colons in it is an error.
3. Every `Comma` that appears is discarded.
1. Every `(`..`)` or `;` that appears is an error.
2. Every `:`, `::`, `:::`, ... is an error, except in context of `Block`s as described below.
3. Every `,` that appears is discarded.
4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error]
5. Every `Record` with no values in it is an error.
6. Every `Block` must contain triplets of `Value`, `Colons` (with a
single colon), `Value`. Any `Block` not following this pattern is an
6. Every `Block` must contain zero or more repeating triplets of
`Expr`, `:`, `Expr`. Any `Block` not following this pattern is an
error. Each `Block` following the pattern is translated to a
`Dictionary` containing a key/value pair for each triplet.
`Dictionary` containing a key/value pair for each triplet. Any
`Block` with duplicate keys (under interpretation) is an error.
7. Every `Set` containing any duplicate expressions (under interpretation) is an error.
[^discard-trailers-instead-of-error]: **Implementation note.** When
implementing parsing of P-expressions into Preserves, consider
@ -168,7 +146,7 @@ classes mentioned above.
Examples are given as pairs of P-expressions and their Preserves
text-syntax encodings.
### Individual P-expression `Value`s
### Individual P-expression `Expr`s
```preserves
<date 1821 (lookup-month "February") 3>
@ -203,19 +181,27 @@ text-syntax encodings.
tearDown();
}⌝
= <b
setUp <g> <s |;|>
setUp <g> <p |;|>
# Now enter the loop
loop <s |:|> <b
greet <g "World"> <s |;|>
loop <p |:|> <b
greet <g "World"> <p |;|>
>
tearDown <g> <s |;|>
tearDown <g> <p |;|>
>
```
```preserves
⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝
= [1 + 2.0 <s |,|> print "Hello" <s |,|> predicate <s |:|> #t <s |,|>
foo <s |,|> #!remote <s |,|> bar]
= [1 + 2.0 <p |,|> print "Hello" <p |,|> predicate <p |:|> #t <p |,|>
foo <p |,|> #!remote <p |,|> bar]
```
```preserves
⌜#{1 2 3}⌝
= <s 1 2 3>
⌜#{(read) (read) (read)}⌝
= <s <g read> <g read> <g read>>
```
```preserves
@ -224,8 +210,8 @@ text-syntax encodings.
address: Address,
}⌝
= <b
optional name <s |:|> string <s |,|>
address <s |:|> Address <s |,|>
optional name <p |:|> string <p |,|>
address <p |:|> Address <p |,|>
>
```
@ -238,7 +224,7 @@ text-syntax encodings.
}
# example of a comment at the end of the input file⌝
= [ <b
key <s |:|> value
key <p |:|> value
@"example of a comment at the end of a dictionary" <a>
>
@"example of a comment at the end of the input file"
@ -273,7 +259,7 @@ generic P-expression reader can then feed into special-purpose
program, and the parser refines this.
Often, a parser will wish to extract structure from sequences of
P-expression `Value`s.
P-expression `Expr`s.
- A simple technique is repeated splitting of sequences; first by
`Semicolon`, then by `Comma`, then by increasingly high binding-power
@ -286,10 +272,39 @@ P-expression `Value`s.
to build a parse tree using an extensible specification of the pre-,
in-, and postfix operators involved.
- Finally, if you treat sequences of `Value`s as pre-lexed token
- Finally, if you treat sequences of `Expr`s as pre-lexed token
streams, almost any parsing formalism (such as [PEG
parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),
[Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to
extract further syntactic structure.
## Appendix: Equations for interpreting P-expressions as Preserves
The partial function **uncomma**(*p*) removes all occurrences of `,`
from a P-expression *p*.
{:.pseudocode.equations}
| **uncomma** : **Expr** | ⇀ | **Expr** | |
| **uncomma**(`[`*p* ...`]`) | = | `[`**uncomma**(*p*) ...`]` | omitting any *p* = `,` |
| **uncomma**(`<`*p* ...`>`) | = | `<`**uncomma**(*p*) ...`>` | omitting any *p* = `,` |
| **uncomma**(`{`*p* ...`}`) | = | `{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
| **uncomma**(`(`*p* ...`)`) | = | `(`**uncomma**(*p*) ...`)` | omitting any *p* = `,` |
| **uncomma**(`#{`*p* ...`}`) | = | `#{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
| **uncomma**(`#!`*p*) | = | `#!`**uncomma**(*p*) ...`}` | |
| **uncomma**(`@`*p* *q*) | = | `@`**uncomma**(*p*) **uncomma**(*q*) | |
| **uncomma**(*p*) | = | *p* | if *p***Atom** **Punct** - {`,`} |
We write ⌞**uncomma**(*p*)⌟ for the partial function mapping a
P-expression *p*`Expr` to a corresponding Preserves `Value`.
{:.pseudocode.equations}
| ⌞·⌟ : **Expr** | ⇀ | **Value** | |
| ⌞`[`*p* ...`]`⌟ | = | `[`⌞*p*⌟ ...`]` | |
| ⌞`<` *p* ...`>`⌟ | = | `<`⌞ℓ⌟ ⌞*p*⌟ ...`>` | |
| ⌞`{`*k*`:`*v* ...`}`⌟ | = | `{`⌞*k*⌟`:`⌞*v*⌟ ...`}` | if all ⌞*k*⌟ ... are distinct |
| ⌞`#{`*p* ...`}`⌟ | = | `#{`⌞*p*⌟ ...`}` | if all ⌞*p*⌟ ... are distinct |
| ⌞`#!`*p*⌟ | = | `#!`⌞*p*⌟ | |
| ⌞`@`*p* *q*⌟ | = | `@`⌞*p*⌟ ⌞*q*⌟ | |
| ⌞*p*⌟ | = | *p* | when *p***Atom** |
## Notes