This is way better
This commit is contained in:
parent
1a0772d39f
commit
d7b983e140
|
@ -0,0 +1,22 @@
|
||||||
|
The definition of `Atom` is as given in the Preserves text syntax.
|
||||||
|
|
||||||
|
```text
|
||||||
|
Document := Expr* sp
|
||||||
|
Expr := sp (Atom | Compound | Punct | Embedded | Annotated)
|
||||||
|
Compound := Sequence | Record | Block | Group | Set
|
||||||
|
Punct := `,` | `;` | `:`+
|
||||||
|
|
||||||
|
sp := (space | tab | cr | lf)*
|
||||||
|
|
||||||
|
Sequence := `[` Expr* Trailer sp `]`
|
||||||
|
Record := `<` Expr* Trailer sp `>`
|
||||||
|
Block := `{` Expr* Trailer sp `}`
|
||||||
|
Group := `(` Expr* Trailer sp `)`
|
||||||
|
Set := `#{` Expr* Trailer sp `}`
|
||||||
|
|
||||||
|
Trailer := Annotation*
|
||||||
|
|
||||||
|
Embedded := `#!` Expr
|
||||||
|
Annotated := Annotation Expr
|
||||||
|
Annotation := `@` Expr | `#` ((space | tab) linecomment) (cr | lf)
|
||||||
|
```
|
|
@ -3,128 +3,105 @@ title: "P-expressions"
|
||||||
---
|
---
|
||||||
|
|
||||||
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
||||||
October 2023. Version 0.2.0.
|
October 2023. Version 0.3.0.
|
||||||
|
|
||||||
|
[text syntax]: preserves-text.html
|
||||||
|
|
||||||
This document defines a grammar called *Preserves Expressions*
|
This document defines a grammar called *Preserves Expressions*
|
||||||
(*P-expressions*, *pexprs*) that includes [ordinary Preserves text
|
(*P-expressions*, *pexprs*) that includes [ordinary Preserves text
|
||||||
syntax](preserves-text.html) but offers extensions sufficient to support
|
syntax][text syntax] but offers extensions sufficient to support a Lisp-
|
||||||
a Lisp- or Haskell-like programming notation.
|
or Haskell-like programming notation.
|
||||||
|
|
||||||
**Motivation.** The [text syntax](preserves-text.html) for Preserves
|
**Motivation.** The [text syntax][] for Preserves works well for writing
|
||||||
works well for writing `Value`s, i.e. data. However, in some contexts,
|
`Value`s, i.e. data. However, in some contexts, Preserves applications
|
||||||
Preserves applications need a broader grammar that allows interleaving
|
need a broader grammar that allows interleaving of *expressions* with
|
||||||
of *expressions* with data. Two examples are the [Preserves Schema
|
data. Two examples are the [Preserves Schema
|
||||||
language](preserves-schema.html) and the [Synit configuration scripting
|
language](preserves-schema.html) and the [Synit configuration scripting
|
||||||
language](https://synit.org/book/operation/scripting.html), both of
|
language](https://synit.org/book/operation/scripting.html), both of
|
||||||
which (ab)use Preserves text syntax as a kind of programming notation.
|
which (ab)use Preserves text syntax as a kind of programming notation.
|
||||||
|
|
||||||
## Preliminaries
|
## Preliminaries
|
||||||
|
|
||||||
The P-expression grammar takes the text syntax grammar as its base and
|
The P-expression grammar includes by reference the definition of `Atom` from the
|
||||||
modifies it.
|
[text syntax][], as well as the definitions that `Atom` depends on.
|
||||||
|
|
||||||
|
P-expressions take their own approach to inter-token whitespace,
|
||||||
|
however.
|
||||||
|
|
||||||
<a id="whitespace">
|
<a id="whitespace">
|
||||||
**Whitespace.** Whitespace is redefined as any number of spaces, tabs,
|
**Whitespace.** Whitespace `sp` is defined as any number of spaces,
|
||||||
carriage returns, or line feeds. Commas are *not* considered whitespace
|
tabs, carriage returns, or line feeds. Commas are *not* considered
|
||||||
in P-expressions.
|
whitespace in P-expressions, and so class `sp` is different to class
|
||||||
|
`ws` from the text syntax.
|
||||||
|
|
||||||
ws = *(%x20 / %x09 / CR / LF)
|
sp = *(%x20 / %x09 / CR / LF)
|
||||||
|
|
||||||
<a id="delimiters"></a>
|
No changes to [the Preserves semantic model](preserves.html) are made.
|
||||||
**Delimiters.** Because commas are no longer included in class `ws`,
|
Every Preserves text-syntax term can be parsed as a valid P-expression,
|
||||||
class `delimiter` is widened to include them explicitly.
|
but in general P-expressions must be rewritten or otherwise interpreted
|
||||||
|
before a meaningful Preserves value can be arrived at ([see
|
||||||
delimiter = ws / ","
|
below](#reading-preserves)).
|
||||||
/ "<" / ">" / "[" / "]" / "{" / "}"
|
|
||||||
/ "#" / ":" / DQUOTE / "|" / "@" / ";"
|
|
||||||
|
|
||||||
## Grammar
|
## Grammar
|
||||||
|
|
||||||
P-expressions add comma, semicolon, and sequences of one or more colons
|
Standalone documents containing P-expressions are sequences of
|
||||||
to the syntax class `Value`.
|
individual `Expr`s, followed by trailing whitespace.
|
||||||
|
|
||||||
Value =/ Comma / Semicolon / Colons
|
Document = *Expr sp
|
||||||
Comma = ","
|
|
||||||
Semicolon = ";"
|
|
||||||
Colons = 1*":"
|
|
||||||
|
|
||||||
Now that colon is in `Value`, the syntax for `Dictionary` is replaced
|
A single P-expression `Expr` can be an `Atom` from the [text syntax][],
|
||||||
with `Block` everywhere it is mentioned.
|
a compound expression, special punctuation, an `Embedded` expression, or
|
||||||
|
an `Annotated` expression.
|
||||||
|
|
||||||
Block = "{" *Value ws "}"
|
Expr = sp (Atom | Compound | Punct | Embedded | Annotated)
|
||||||
|
|
||||||
Syntax for `Record` is loosened to allow empty angle brackets.
|
Embedded and annotated values are as in the text syntax, differing only
|
||||||
|
in that uses of `Value` are replaced with `Expr`.
|
||||||
|
|
||||||
Record = "<" *Value ws ">"
|
Embedded = "#!" Expr
|
||||||
|
Annotated = Annotation Expr
|
||||||
|
Annotation = "@" Expr / "#" [(%x20 / %x09) linecomment] (CR / LF)
|
||||||
|
|
||||||
New syntax for explicit uninterpreted grouping of sequences of values is
|
P-expression special punctuation marks are comma, semicolon, and sequences of one or more colons.
|
||||||
introduced, and added to class `Value`.
|
|
||||||
|
|
||||||
Value =/ ws Group
|
Punct = "," / ";" / 1*":"
|
||||||
Group = "(" *Value ws ")"
|
|
||||||
|
|
||||||
Finally, class `Document` is replaced in order to allow standalone
|
Compound expressions are sequences of `Expr`s with optional trailing
|
||||||
documents to directly comprise a sequence of multiple values.
|
`Annotation`s, surrounded by various kinds of parentheses.
|
||||||
|
|
||||||
Document = *Value ws
|
Compound = Sequence / Record / Block / Group / Set
|
||||||
|
Sequence = "[" *Expr Trailer sp "]"
|
||||||
|
Record = "<" *Expr Trailer sp ">"
|
||||||
|
Block = "{" *Expr Trailer sp "}"
|
||||||
|
Group = "(" *Expr Trailer sp ")"
|
||||||
|
Set = "#{" *Expr Trailer sp "}"
|
||||||
|
|
||||||
No changes to [the Preserves semantic model](preserves.html) are made.
|
In an `Annotated` P-expression, annotations and comments attach to the
|
||||||
Every Preserves text-syntax term is a valid P-expression, but in general
|
term following them, just as in the ordinary text syntax. However, it is
|
||||||
P-expressions must be rewritten or otherwise interpreted before a
|
common in programming notations to allow comments at the end of a file
|
||||||
meaningful Preserves value can be arrived at ([see
|
or other sequential construct. The ordinary text syntax forbids comments
|
||||||
below](#reading-preserves)).
|
in these positions, but P-expressions allow them.
|
||||||
|
|
||||||
## <a id="annotations"></a>Annotations and Comments
|
Trailer = *Annotation
|
||||||
|
|
||||||
Annotations and comments attach to the term following them, just as in
|
|
||||||
the ordinary text syntax. However, it is common in programming notations
|
|
||||||
to allow comments at the end of a file or other sequential construct:
|
|
||||||
|
|
||||||
{
|
|
||||||
key: value
|
|
||||||
# example of a comment at the end of a dictionary
|
|
||||||
}
|
|
||||||
# example of a comment at the end of the input file
|
|
||||||
|
|
||||||
While the ordinary text syntax forbids comments in these positions,
|
|
||||||
P-expressions allow them:
|
|
||||||
|
|
||||||
Document =/ *Value Trailer ws
|
|
||||||
Record =/ "<" *Value Trailer ws ">"
|
|
||||||
Sequence =/ "[" *Value Trailer ws "]"
|
|
||||||
Set =/ "#{" *Value Trailer ws "}"
|
|
||||||
Block =/ "{" *Value Trailer ws "}"
|
|
||||||
Group =/ "(" *Value Trailer ws ")"
|
|
||||||
|
|
||||||
Trailer = 1*Annotation
|
|
||||||
|
|
||||||
## <a id="encoding-pexprs"></a>Encoding P-expressions as Preserves
|
## <a id="encoding-pexprs"></a>Encoding P-expressions as Preserves
|
||||||
|
|
||||||
We write ⌜*p*⌝ for the encoding into Preserves of P-expression *p*.
|
We write ⌜*p*⌝ for the encoding into Preserves of P-expression *p*.
|
||||||
|
|
||||||
{:.pseudocode.equations}
|
{:.pseudocode.equations}
|
||||||
| ⌜·⌝ : **P-expression** | ⟶ | **Preserves** |
|
| ⌜·⌝ : **Expr** | ⟶ | **Value** |
|
||||||
|
| ⌜`[`*p* ...`]`⌝ | = | `[`⌜*p*⌝ ...`]` |
|
||||||
Aside from `Group`, `Block`, `Comma`, `Semicolon`, `Colons`, `Trailer`,
|
| ⌜`<`*p* ...`>`⌝ | = | `<r` ⌜*p*⌝ ...`>` |
|
||||||
and `Record`, P-expressions are encoded directly as Preserves data.
|
| ⌜`{`*p* ...`}`⌝ | = | `<b` ⌜*p*⌝ ...`>` |
|
||||||
|
| ⌜`(`*p* ...`)`⌝ | = | `<g` ⌜*p*⌝ ...`>` |
|
||||||
{:.pseudocode.equations}
|
| ⌜`#{`*p* ...`}`⌝ | = | `<s `⌜*p*⌝ ...`>` |
|
||||||
| ⌜`[`*p* ...`]`⌝ | = | `[`⌜*p*⌝ ...`]` |
|
| ⌜`#!`*p*⌝ | = | `#!`⌜*p*⌝ |
|
||||||
| ⌜`#{`*p* ...`}`⌝ | = | `#{`⌜*p*⌝ ...`}` |
|
| ⌜`@`*p* *q*⌝ | = | `@`⌜*p*⌝ ⌜*q*⌝ |
|
||||||
| ⌜`#!`*p*⌝ | = | `#!`⌜*p*⌝ |
|
| ⌜*p*⌝ | = | *p* | when *p* ∈ **Atom** |
|
||||||
| ⌜`@`*p* *q*⌝ | = | `@`⌜*p*⌝ ⌜*q*⌝ |
|
| ⌜`,`⌝ | = | `<p |,|>` |
|
||||||
| ⌜*p*⌝ | = | *p* when *p* ∈ **Atom** |
|
| ⌜`;`⌝ | = | `<p |;|>` |
|
||||||
|
| ⌜`:` ...⌝ | = | `<p |:` ...`|>` |
|
||||||
Everything else is encoded as Preserves records.
|
| ⌜*t*⌝ | = | ⌜*a*⌝ ... `<a>` | where *a* ... are the annotations in *t* and *t* ∈ **Trailer** |
|
||||||
|
|
||||||
{:.pseudocode.equations}
|
|
||||||
| ⌜`<`*p* ...`>`⌝ | = | `<r` ⌜*p*⌝ ...`>` |
|
|
||||||
| ⌜`(`*p* ...`)`⌝ | = | `<g` ⌜*p*⌝ ...`>` |
|
|
||||||
| ⌜`{`*p* ...`}`⌝ | = | `<b` ⌜*p*⌝ ...`>` |
|
|
||||||
| ⌜`,`⌝ | = | `<s |,|>` |
|
|
||||||
| ⌜`;`⌝ | = | `<s |;|>` |
|
|
||||||
| ⌜`:` ...⌝ | = | `<s |:` ...`|>` |
|
|
||||||
| ⌜*t*⌝ | = | ⌜*a*⌝ ... `<a>`, where *a* ... are the annotations in *t* and *t* ∈ **Trailer** |
|
|
||||||
|
|
||||||
The record `<a>` acts as an anchor for the annotations in a `Trailer`.
|
The record `<a>` acts as an anchor for the annotations in a `Trailer`.
|
||||||
|
|
||||||
|
@ -145,18 +122,19 @@ using Preserves text syntax and then (3) read as a P-expression can be
|
||||||
|
|
||||||
A reader for P-expressions can be adapted to yield a reader for
|
A reader for P-expressions can be adapted to yield a reader for
|
||||||
Preserves terms by processing (subterms of) each P-expression that the
|
Preserves terms by processing (subterms of) each P-expression that the
|
||||||
reader produces. The only subterms that need processing are the special
|
reader produces.
|
||||||
classes mentioned above.
|
|
||||||
|
|
||||||
1. Every `Group` or `Semicolon` that appears is an error.
|
1. Every `(`..`)` or `;` that appears is an error.
|
||||||
2. Every `Colons` with two or more colons in it is an error.
|
2. Every `:`, `::`, `:::`, ... is an error, except in context of `Block`s as described below.
|
||||||
3. Every `Comma` that appears is discarded.
|
3. Every `,` that appears is discarded.
|
||||||
4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error]
|
4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error]
|
||||||
5. Every `Record` with no values in it is an error.
|
5. Every `Record` with no values in it is an error.
|
||||||
6. Every `Block` must contain triplets of `Value`, `Colons` (with a
|
6. Every `Block` must contain zero or more repeating triplets of
|
||||||
single colon), `Value`. Any `Block` not following this pattern is an
|
`Expr`, `:`, `Expr`. Any `Block` not following this pattern is an
|
||||||
error. Each `Block` following the pattern is translated to a
|
error. Each `Block` following the pattern is translated to a
|
||||||
`Dictionary` containing a key/value pair for each triplet.
|
`Dictionary` containing a key/value pair for each triplet. Any
|
||||||
|
`Block` with duplicate keys (under interpretation) is an error.
|
||||||
|
7. Every `Set` containing any duplicate expressions (under interpretation) is an error.
|
||||||
|
|
||||||
[^discard-trailers-instead-of-error]: **Implementation note.** When
|
[^discard-trailers-instead-of-error]: **Implementation note.** When
|
||||||
implementing parsing of P-expressions into Preserves, consider
|
implementing parsing of P-expressions into Preserves, consider
|
||||||
|
@ -168,7 +146,7 @@ classes mentioned above.
|
||||||
Examples are given as pairs of P-expressions and their Preserves
|
Examples are given as pairs of P-expressions and their Preserves
|
||||||
text-syntax encodings.
|
text-syntax encodings.
|
||||||
|
|
||||||
### Individual P-expression `Value`s
|
### Individual P-expression `Expr`s
|
||||||
|
|
||||||
```preserves
|
```preserves
|
||||||
⌜<date 1821 (lookup-month "February") 3>⌝
|
⌜<date 1821 (lookup-month "February") 3>⌝
|
||||||
|
@ -203,19 +181,27 @@ text-syntax encodings.
|
||||||
tearDown();
|
tearDown();
|
||||||
}⌝
|
}⌝
|
||||||
= <b
|
= <b
|
||||||
setUp <g> <s |;|>
|
setUp <g> <p |;|>
|
||||||
# Now enter the loop
|
# Now enter the loop
|
||||||
loop <s |:|> <b
|
loop <p |:|> <b
|
||||||
greet <g "World"> <s |;|>
|
greet <g "World"> <p |;|>
|
||||||
>
|
>
|
||||||
tearDown <g> <s |;|>
|
tearDown <g> <p |;|>
|
||||||
>
|
>
|
||||||
```
|
```
|
||||||
|
|
||||||
```preserves
|
```preserves
|
||||||
⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝
|
⌜[1 + 2.0, print "Hello", predicate: #t, foo, #!remote, bar]⌝
|
||||||
= [1 + 2.0 <s |,|> print "Hello" <s |,|> predicate <s |:|> #t <s |,|>
|
= [1 + 2.0 <p |,|> print "Hello" <p |,|> predicate <p |:|> #t <p |,|>
|
||||||
foo <s |,|> #!remote <s |,|> bar]
|
foo <p |,|> #!remote <p |,|> bar]
|
||||||
|
```
|
||||||
|
|
||||||
|
```preserves
|
||||||
|
⌜#{1 2 3}⌝
|
||||||
|
= <s 1 2 3>
|
||||||
|
|
||||||
|
⌜#{(read) (read) (read)}⌝
|
||||||
|
= <s <g read> <g read> <g read>>
|
||||||
```
|
```
|
||||||
|
|
||||||
```preserves
|
```preserves
|
||||||
|
@ -224,8 +210,8 @@ text-syntax encodings.
|
||||||
address: Address,
|
address: Address,
|
||||||
}⌝
|
}⌝
|
||||||
= <b
|
= <b
|
||||||
optional name <s |:|> string <s |,|>
|
optional name <p |:|> string <p |,|>
|
||||||
address <s |:|> Address <s |,|>
|
address <p |:|> Address <p |,|>
|
||||||
>
|
>
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -238,7 +224,7 @@ text-syntax encodings.
|
||||||
}
|
}
|
||||||
# example of a comment at the end of the input file⌝
|
# example of a comment at the end of the input file⌝
|
||||||
= [ <b
|
= [ <b
|
||||||
key <s |:|> value
|
key <p |:|> value
|
||||||
@"example of a comment at the end of a dictionary" <a>
|
@"example of a comment at the end of a dictionary" <a>
|
||||||
>
|
>
|
||||||
@"example of a comment at the end of the input file"
|
@"example of a comment at the end of the input file"
|
||||||
|
@ -273,7 +259,7 @@ generic P-expression reader can then feed into special-purpose
|
||||||
program, and the parser refines this.
|
program, and the parser refines this.
|
||||||
|
|
||||||
Often, a parser will wish to extract structure from sequences of
|
Often, a parser will wish to extract structure from sequences of
|
||||||
P-expression `Value`s.
|
P-expression `Expr`s.
|
||||||
|
|
||||||
- A simple technique is repeated splitting of sequences; first by
|
- A simple technique is repeated splitting of sequences; first by
|
||||||
`Semicolon`, then by `Comma`, then by increasingly high binding-power
|
`Semicolon`, then by `Comma`, then by increasingly high binding-power
|
||||||
|
@ -286,10 +272,39 @@ P-expression `Value`s.
|
||||||
to build a parse tree using an extensible specification of the pre-,
|
to build a parse tree using an extensible specification of the pre-,
|
||||||
in-, and postfix operators involved.
|
in-, and postfix operators involved.
|
||||||
|
|
||||||
- Finally, if you treat sequences of `Value`s as pre-lexed token
|
- Finally, if you treat sequences of `Expr`s as pre-lexed token
|
||||||
streams, almost any parsing formalism (such as [PEG
|
streams, almost any parsing formalism (such as [PEG
|
||||||
parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),
|
parsing](https://en.wikipedia.org/wiki/Parsing_expression_grammar),
|
||||||
[Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to
|
[Ometa](https://en.wikipedia.org/wiki/OMeta), etc.) can be used to
|
||||||
extract further syntactic structure.
|
extract further syntactic structure.
|
||||||
|
|
||||||
|
## Appendix: Equations for interpreting P-expressions as Preserves
|
||||||
|
|
||||||
|
The partial function **uncomma**(*p*) removes all occurrences of `,`
|
||||||
|
from a P-expression *p*.
|
||||||
|
|
||||||
|
{:.pseudocode.equations}
|
||||||
|
| **uncomma** : **Expr** | ⇀ | **Expr** | |
|
||||||
|
| **uncomma**(`[`*p* ...`]`) | = | `[`**uncomma**(*p*) ...`]` | omitting any *p* = `,` |
|
||||||
|
| **uncomma**(`<`*p* ...`>`) | = | `<`**uncomma**(*p*) ...`>` | omitting any *p* = `,` |
|
||||||
|
| **uncomma**(`{`*p* ...`}`) | = | `{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
|
||||||
|
| **uncomma**(`(`*p* ...`)`) | = | `(`**uncomma**(*p*) ...`)` | omitting any *p* = `,` |
|
||||||
|
| **uncomma**(`#{`*p* ...`}`) | = | `#{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
|
||||||
|
| **uncomma**(`#!`*p*) | = | `#!`**uncomma**(*p*) ...`}` | |
|
||||||
|
| **uncomma**(`@`*p* *q*) | = | `@`**uncomma**(*p*) **uncomma**(*q*) | |
|
||||||
|
| **uncomma**(*p*) | = | *p* | if *p* ∈ **Atom** ∪ **Punct** - {`,`} |
|
||||||
|
|
||||||
|
We write ⌞**uncomma**(*p*)⌟ for the partial function mapping a
|
||||||
|
P-expression *p* ∈ `Expr` to a corresponding Preserves `Value`.
|
||||||
|
|
||||||
|
{:.pseudocode.equations}
|
||||||
|
| ⌞·⌟ : **Expr** | ⇀ | **Value** | |
|
||||||
|
| ⌞`[`*p* ...`]`⌟ | = | `[`⌞*p*⌟ ...`]` | |
|
||||||
|
| ⌞`<`ℓ *p* ...`>`⌟ | = | `<`⌞ℓ⌟ ⌞*p*⌟ ...`>` | |
|
||||||
|
| ⌞`{`*k*`:`*v* ...`}`⌟ | = | `{`⌞*k*⌟`:`⌞*v*⌟ ...`}` | if all ⌞*k*⌟ ... are distinct |
|
||||||
|
| ⌞`#{`*p* ...`}`⌟ | = | `#{`⌞*p*⌟ ...`}` | if all ⌞*p*⌟ ... are distinct |
|
||||||
|
| ⌞`#!`*p*⌟ | = | `#!`⌞*p*⌟ | |
|
||||||
|
| ⌞`@`*p* *q*⌟ | = | `@`⌞*p*⌟ ⌞*q*⌟ | |
|
||||||
|
| ⌞*p*⌟ | = | *p* | when *p* ∈ **Atom** |
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
Loading…
Reference in New Issue