Restrict usage of commas. Bump spec version to 0.992

This commit is contained in:
Tony Garnock-Jones 2023-11-01 14:13:36 +01:00
parent e0736e03c5
commit c89147dd6a
8 changed files with 82 additions and 87 deletions

1
.gitignore vendored
View File

@ -1,4 +1,5 @@
_site/ _site/
cheatsheet.pdf
preserves-expressions.pdf preserves-expressions.pdf
preserves-binary.pdf preserves-binary.pdf
preserves-schema.pdf preserves-schema.pdf

View File

@ -5,7 +5,8 @@ PDFS=\
preserves-text.pdf \ preserves-text.pdf \
preserves-binary.pdf \ preserves-binary.pdf \
preserves-schema.pdf \ preserves-schema.pdf \
preserves-expressions.pdf preserves-expressions.pdf \
cheatsheet.pdf
all: $(PDFS) all: $(PDFS)

View File

@ -14,4 +14,4 @@ defaults:
title: "Preserves" title: "Preserves"
version_date: "October 2023" version_date: "October 2023"
version: "0.991.0" version: "0.992.0"

View File

@ -3,14 +3,15 @@ Document := Value ws
Value := ws (Record | Collection | Atom | Embedded | Annotated) Value := ws (Record | Collection | Atom | Embedded | Annotated)
Collection := Sequence | Dictionary | Set Collection := Sequence | Dictionary | Set
Atom := Boolean | ByteString | String | QuotedSymbol | Symbol | Number Atom := Boolean | ByteString | String | QuotedSymbol | Symbol | Number
ws := (space | tab | cr | lf | `,`)* ws := (space | tab | cr | lf)*
commas := (ws `,`)* ws
delimiter := ws | `<` | `>` | `[` | `]` | `{` | `}` delimiter := ws | `<` | `>` | `[` | `]` | `{` | `}`
| `#` | `:` | `"` | `|` | `@` | `;` | `#` | `:` | `"` | `|` | `@` | `;` | `,`
Record := `<` Value+ ws `>` Record := `<` Value+ ws `>`
Sequence := `[` Value* ws `]` Sequence := `[` (commas Value)* commas `]`
Dictionary := `{` (Value ws `:` Value)* ws `}` Set := `#{` (commas Value)* commas `}`
Set := `#{` Value* ws `}` Dictionary := `{` (commas Value ws `:` Value)* commas `}`
Boolean := `#t` | `#f` Boolean := `#t` | `#f`
ByteString := `#"` binchar* `"` ByteString := `#"` binchar* `"`

View File

@ -3,14 +3,15 @@
| *Value* | := | **ws** (*Record* &#124; *Collection* &#124; *Atom* &#124; *Embedded* &#124; *Annotated*) | | *Value* | := | **ws** (*Record* &#124; *Collection* &#124; *Atom* &#124; *Embedded* &#124; *Annotated*) |
| *Collection* | := | *Sequence* &#124; *Dictionary* &#124; *Set* | | *Collection* | := | *Sequence* &#124; *Dictionary* &#124; *Set* |
| *Atom* | := | *Boolean* &#124; *ByteString* &#124; *String* &#124; *QuotedSymbol* &#124; *Symbol* &#124; *Number* | | *Atom* | := | *Boolean* &#124; *ByteString* &#124; *String* &#124; *QuotedSymbol* &#124; *Symbol* &#124; *Number* |
| **ws** | := | (**space** &#124; **tab** &#124; **cr** &#124; **lf** &#124;`,`)<sup></sup> | | **ws** | := | (**space** &#124; **tab** &#124; **cr** &#124; **lf**)<sup></sup> |
| **delimiter** | := | **ws** &#124; `<` &#124; `>` &#124; `[` &#124; `]` &#124; `{` &#124; `}` &#124; `#` &#124; `:` &#124; `"` &#124; `|` &#124; `@` &#124; `;` | | **commas** | := | (**ws** `,`)<sup></sup> **ws** |
| **delimiter** | := | **ws** &#124; `<` &#124; `>` &#124; `[` &#124; `]` &#124; `{` &#124; `}` &#124; `#` &#124; `:` &#124; `"` &#124; `|` &#124; `@` &#124; `;` &#124; `,` |
{:.postcard-grammar.textsyntax} {:.postcard-grammar.textsyntax}
| *Record* | := | `<`*Value*<sup>+</sup> **ws**`>` | | *Record* | := | `<`*Value*<sup>+</sup> **ws**`>` |
| *Sequence* | := | `[`*Value*<sup></sup> **ws**`]` | | *Sequence* | := | `[`(**commas** *Value*)<sup></sup> **commas**`]` |
| *Dictionary* | := | `{` (*Value* **ws**`:`*Value*)<sup></sup> **ws**`}` | | *Set* | := | `#{`(**commas** *Value*)<sup></sup> **commas**`}` |
| *Set* | := | `#{`*Value*<sup></sup> **ws**`}` | | *Dictionary* | := | `{` (**commas** *Value* **ws**`:`*Value*)<sup></sup> **commas**`}` |
{:.postcard-grammar.textsyntax} {:.postcard-grammar.textsyntax}
| *Boolean* | := | `#t`&#124;`#f` | | *Boolean* | := | `#t`&#124;`#f` |

View File

@ -3,14 +3,15 @@ Document := Value ws
Value := ws (Record | Collection | Atom | Embedded | Annotated) Value := ws (Record | Collection | Atom | Embedded | Annotated)
Collection := Sequence | Dictionary | Set Collection := Sequence | Dictionary | Set
Atom := Boolean | ByteString | String | QuotedSymbol | Symbol | Number Atom := Boolean | ByteString | String | QuotedSymbol | Symbol | Number
ws := (space | tab | cr | lf | `,`)* ws := (space | tab | cr | lf)*
commas := (ws `,`)* ws
delimiter := ws | `<` | `>` | `[` | `]` | `{` | `}` delimiter := ws | `<` | `>` | `[` | `]` | `{` | `}`
| `#` | `:` | `"` | `|` | `@` | `;` | `#` | `:` | `"` | `|` | `@` | `;` | `,`
Record := `<` Value+ ws `>` Record := `<` Value+ ws `>`
Sequence := `[` Value* ws `]` Sequence := `[` (commas Value)* commas `]`
Dictionary := `{` (Value ws `:` Value)* ws `}` Set := `#{` (commas Value)* commas `}`
Set := `#{` Value* ws `}` Dictionary := `{` (commas Value ws `:` Value)* commas `}`
Boolean := `#t` | `#f` Boolean := `#t` | `#f`
ByteString := `#"` binchar* `"` ByteString := `#"` binchar* `"`

View File

@ -25,16 +25,11 @@ which (ab)use Preserves text syntax as a kind of programming notation.
The P-expression grammar includes by reference the definition of `Atom` from the The P-expression grammar includes by reference the definition of `Atom` from the
[text syntax][], as well as the definitions that `Atom` depends on. [text syntax][], as well as the definitions that `Atom` depends on.
P-expressions take their own approach to inter-token whitespace,
however.
<a id="whitespace"> <a id="whitespace">
**Whitespace.** Whitespace `sp` is defined as any number of spaces, **Whitespace.** Whitespace `ws` is, as in the text syntax, defined as
tabs, carriage returns, or line feeds. Commas are *not* considered any number of spaces, tabs, carriage returns, or line feeds.
whitespace in P-expressions, and so class `sp` is different to class
`ws` from the text syntax.
sp = *(%x20 / %x09 / CR / LF) ws = *(%x20 / %x09 / CR / LF)
No changes to [the Preserves semantic model](preserves.html) are made. No changes to [the Preserves semantic model](preserves.html) are made.
Every Preserves text-syntax term can be parsed as a valid P-expression, Every Preserves text-syntax term can be parsed as a valid P-expression,
@ -47,20 +42,22 @@ below](#reading-preserves)).
Standalone documents containing P-expressions are sequences of Standalone documents containing P-expressions are sequences of
individual `Expr`s, followed by trailing whitespace. individual `Expr`s, followed by trailing whitespace.
Document = *Expr sp Document = *Expr ws
A single P-expression `Expr` can be an `Atom` from the [text syntax][], A single P-expression `Expr` can be an `Atom` from the [text syntax][],
a compound expression, special punctuation, an `Embedded` expression, or a compound expression, special punctuation, an `Embedded` expression, or
an `Annotated` expression. an `Annotated` expression. The class `SimpleExpr` includes all of `Expr`
except special punctuation.
Expr = sp (Atom | Compound | Punct | Embedded | Annotated) Expr = ws (SimpleExpr | Punct)
SimpleExpr = Atom | Compound | Embedded | Annotated
Embedded and annotated values are as in the text syntax, differing only Embedded and annotated values are as in the text syntax, differing only
in that uses of `Value` are replaced with `Expr`. in that uses of `Value` are replaced with `SimpleExpr`.
Embedded = "#!" Expr Embedded = "#!" SimpleExpr
Annotated = Annotation Expr Annotated = Annotation SimpleExpr
Annotation = "@" Expr / "#" [(%x20 / %x09) linecomment] (CR / LF) Annotation = "@" SimpleExpr / "#" [(%x20 / %x09) linecomment] (CR / LF)
P-expression special punctuation marks are comma, semicolon, and sequences of one or more colons. P-expression special punctuation marks are comma, semicolon, and sequences of one or more colons.
@ -70,11 +67,11 @@ Compound expressions are sequences of `Expr`s with optional trailing
`Annotation`s, surrounded by various kinds of parentheses. `Annotation`s, surrounded by various kinds of parentheses.
Compound = Sequence / Record / Block / Group / Set Compound = Sequence / Record / Block / Group / Set
Sequence = "[" *Expr Trailer sp "]" Sequence = "[" *Expr Trailer ws "]"
Record = "<" *Expr Trailer sp ">" Record = "<" *Expr Trailer ws ">"
Block = "{" *Expr Trailer sp "}" Block = "{" *Expr Trailer ws "}"
Group = "(" *Expr Trailer sp ")" Group = "(" *Expr Trailer ws ")"
Set = "#{" *Expr Trailer sp "}" Set = "#{" *Expr Trailer ws "}"
In an `Annotated` P-expression, annotations and comments attach to the In an `Annotated` P-expression, annotations and comments attach to the
term following them, just as in the ordinary text syntax. However, it is term following them, just as in the ordinary text syntax. However, it is
@ -117,9 +114,8 @@ sequences of Preserves values.
The [previous section](#encoding-pexprs) discussed ways of representing The [previous section](#encoding-pexprs) discussed ways of representing
P-expressions using Preserves. Here, we discuss *interpreting* P-expressions using Preserves. Here, we discuss *interpreting*
P-expressions *as* Preserves so that (1) a Preserves datum (2) written P-expressions *as* Preserves so that (1) a Preserves datum (2) written
using Preserves text syntax[^careful-use-of-commas] and then (3) read as using Preserves text syntax and then (3) read as a P-expression can be
a P-expression can be (4) interpreted from that P-expression to yield (4) interpreted from that P-expression to yield the original datum.
the original datum.
1. Every `(`..`)` or `;` that appears is an error. 1. Every `(`..`)` or `;` that appears is an error.
2. Every `:`, `::`, `:::`, ... is an error, except in context of `Block`s as described below. 2. Every `:`, `::`, `:::`, ... is an error, except in context of `Block`s as described below.
@ -127,17 +123,13 @@ the original datum.
4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error] 4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error]
5. Every `Record` with no values in it is an error. 5. Every `Record` with no values in it is an error.
6. Every `Block` must contain zero or more repeating triplets of 6. Every `Block` must contain zero or more repeating triplets of
`Expr`, `:`, `Expr`. Any `Block` not following this pattern is an `SimpleExpr`, `:`, `SimpleExpr`. Any `Block` not following this
error. Each `Block` following the pattern is translated to a pattern is an error. Each `Block` following the pattern is
`Dictionary` containing a key/value pair for each triplet. Any translated to a `Dictionary` containing a key/value pair for each
`Block` with duplicate keys (under interpretation) is an error. triplet. Any `Block` with duplicate keys (under interpretation) is
7. Every `Set` containing any duplicate expressions (under interpretation) is an error. an error.
7. Every `Set` containing any duplicate expressions (under
[^careful-use-of-commas]: Every Preserves datum can be read via a interpretation) is an error.
P-expression reader and then interpreted successfully as Preserves
*if commas are omitted entirely in the text*. If commas are present,
however, they must not appear in certain positions, namely: either
before or after *p* in `@`*p* *q*; or before *p* in `#!`*p*.
[^discard-trailers-instead-of-error]: **Implementation note.** When [^discard-trailers-instead-of-error]: **Implementation note.** When
implementing parsing of P-expressions into Preserves, consider implementing parsing of P-expressions into Preserves, consider
@ -283,35 +275,31 @@ P-expression `Expr`s.
## <a id="reading-preserves-equationally"></a>Appendix: Equations for interpreting P-expressions as Preserves ## <a id="reading-preserves-equationally"></a>Appendix: Equations for interpreting P-expressions as Preserves
The partial function **uncomma**(*p*) removes all occurrences of `,` The function **uncomma**(*p*) removes all occurrences of `,` from a
from a P-expression *p*. P-expression *p*`Expr` {`,`}.
{:.pseudocode.equations} {:.pseudocode.equations}
| **uncomma** : **Expr** | ⇀ | **Expr** | | | **uncomma** : **Expr** {`,`} | ⟶ | **Expr** | |
| **uncomma**(`[`*p* ...`]`) | = | `[`**uncomma**(*p*) ...`]` | omitting any *p* = `,` | | **uncomma**(`[`*p* ...`]`) | = | `[`**uncomma**(*p*) ...`]` | omitting any *p* = `,` |
| **uncomma**(`<`*p* ...`>`) | = | `<`**uncomma**(*p*) ...`>` | omitting any *p* = `,` | | **uncomma**(`<`*p* ...`>`) | = | `<`**uncomma**(*p*) ...`>` | omitting any *p* = `,` |
| **uncomma**(`{`*p* ...`}`) | = | `{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` | | **uncomma**(`{`*p* ...`}`) | = | `{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
| **uncomma**(`(`*p* ...`)`) | = | `(`**uncomma**(*p*) ...`)` | omitting any *p* = `,` | | **uncomma**(`(`*p* ...`)`) | = | `(`**uncomma**(*p*) ...`)` | omitting any *p* = `,` |
| **uncomma**(`#{`*p* ...`}`) | = | `#{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` | | **uncomma**(`#{`*p* ...`}`) | = | `#{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
| **uncomma**(`#!`*p*) | = | `#!`**uncomma**(*p*) ...`}` | | | **uncomma**(`#!`*p*) | = | `#!`**uncomma**(*p*) | |
| **uncomma**(`@`*p* *q*) | = | `@`**uncomma**(*p*) **uncomma**(*q*) | | | **uncomma**(`@`*p* *q*) | = | `@`**uncomma**(*p*) **uncomma**(*q*) | |
| **uncomma**(*p*) | = | *p* | if *p***Atom** **Punct** - {`,`} | | **uncomma**(*p*) | = | *p* | if *p***Atom** **Punct** {`,`} |
{:.pseudocode.equations}
| **uncomma** : **Document** | ⇀ | **Document** |
| **uncomma**(*p* ...) | = | **uncomma**(*p*) ... |
We write ⌞**uncomma**(*p*)⌟ for the partial function mapping a We write ⌞**uncomma**(*p*)⌟ for the partial function mapping a
P-expression *p*`Expr` to a corresponding Preserves `Value`. P-expression *p*`Expr` {`,`} to a corresponding Preserves `Value`.
{:.pseudocode.equations} {:.pseudocode.equations}
| ⌞·⌟ : **Expr** | ⇀ | **Value** | | | ⌞·⌟ : **Expr** {`,`} | ⇀ | **Value** | |
| ⌞`[`*p* ...`]`⌟ | = | `[`⌞*p*⌟ ...`]` | | | ⌞`[`*p* ...`]`⌟ | = | `[`⌞*p*⌟ ...`]` | |
| ⌞`<` *p* ...`>`⌟ | = | `<`⌞ℓ⌟ ⌞*p*⌟ ...`>` | | | ⌞`<` *p* ...`>`⌟ | = | `<`⌞ℓ⌟ ⌞*p*⌟ ...`>` | |
| ⌞`{`*k*`:`*v* ...`}`⌟ | = | `{`⌞*k*⌟`:`⌞*v*⌟ ...`}` | if all ⌞*k*⌟ ... are distinct | | ⌞`{`*k*`:`*v* ...`}`⌟ | = | `{`⌞*k*⌟`:`⌞*v*⌟ ...`}` | if all ⌞*k*⌟ ... are distinct |
| ⌞`#{`*p* ...`}`⌟ | = | `#{`⌞*p*⌟ ...`}` | if all ⌞*p*⌟ ... are distinct | | ⌞`#{`*p* ...`}`⌟ | = | `#{`⌞*p*⌟ ...`}` | if all ⌞*p*⌟ ... are distinct |
| ⌞`#!`*p*⌟ | = | `#!`⌞*p*⌟ | | | ⌞`#!`*p*⌟ | = | `#!`⌞*p*⌟ | |
| ⌞`@`*p* *q*⌟ | = | `@`⌞*p*⌟ ⌞*q*⌟ | | | ⌞`@`*p* *q* | = | `@`⌞*p*⌟ ⌞*q*⌟ | |
| ⌞*p*⌟ | = | *p* | when *p***Atom** | | ⌞*p*⌟ | = | *p* | when *p***Atom** |
## Notes ## Notes

View File

@ -28,10 +28,15 @@ a grammar for recognising sequences of Unicode scalar values.
UTF-8 where possible. UTF-8 where possible.
<a id="whitespace"></a> <a id="whitespace"></a>
**Whitespace.** Whitespace is defined as any number of spaces, tabs, **Whitespace.** Whitespace `ws` is defined as any number of spaces, tabs,
carriage returns, line feeds, or commas. carriage returns, or line feeds.
ws = *(%x20 / %x09 / CR / LF / ",") ws = *(%x20 / %x09 / CR / LF)
<a id="commas"></a>
**Commas.** In some positions inside compound terms, commas are permitted and ignored.
commas = *(ws ",") ws
<a id="delimiters"></a> <a id="delimiters"></a>
**Delimiters.** Some tokens (`Boolean`, `SymbolOrNumber`) *MUST* be **Delimiters.** Some tokens (`Boolean`, `SymbolOrNumber`) *MUST* be
@ -39,7 +44,7 @@ followed by a `delimiter` or by the end of the input.[^delimiters-lookahead]
delimiter = ws delimiter = ws
/ "<" / ">" / "[" / "]" / "{" / "}" / "<" / ">" / "[" / "]" / "{" / "}"
/ "#" / ":" / DQUOTE / "|" / "@" / ";" / "#" / ":" / DQUOTE / "|" / "@" / ";" / ","
[^delimiters-lookahead]: The addition of this constraint means that [^delimiters-lookahead]: The addition of this constraint means that
implementations must now use some kind of lookahead to make sure a implementations must now use some kind of lookahead to make sure a
@ -73,9 +78,9 @@ printing sets and dictionaries, implementations *SHOULD* order elements
resp. keys with respect to the [total order over resp. keys with respect to the [total order over
`Value`s](preserves.html#total-order).[^rationale-print-ordering] `Value`s](preserves.html#total-order).[^rationale-print-ordering]
Sequence = "[" *Value ws "]" Sequence = "[" *(commas Value) commas "]"
Set = "#{" *Value ws "}" Set = "#{" *(commas Value) commas "}"
Dictionary = "{" *(Value ws ":" Value) ws "}" Dictionary = "{" *(commas Value ws ":" Value) commas "}"
[^printing-collections]: **Implementation note.** When implementing [^printing-collections]: **Implementation note.** When implementing
printing of `Value`s using the textual syntax, consider supporting printing of `Value`s using the textual syntax, consider supporting
@ -147,8 +152,8 @@ following the usual rules for double quote and backslash.
/ %s"\x" 2HEXDIG / %s"\x" 2HEXDIG
/ "\" DQUOTE / "\" DQUOTE
The second is a sequence of pairs of hexadecimal digits interleaved The second is pairs of hexadecimal digits interleaved with whitespace
with whitespace and surrounded by `#x"` and `"`. and surrounded by `#x"` and `"`.
ByteString =/ %s"#x" DQUOTE *(ws 2HEXDIG) ws DQUOTE ByteString =/ %s"#x" DQUOTE *(ws 2HEXDIG) ws DQUOTE
@ -320,9 +325,6 @@ give an illusion of progress.
## Acknowledgements ## Acknowledgements
The treatment of commas as whitespace in the text syntax is inspired
by the same feature of [EDN](https://github.com/edn-format/edn).
The text syntax for `Boolean`s, `Symbol`s, and `ByteString`s is The text syntax for `Boolean`s, `Symbol`s, and `ByteString`s is
directly inspired by [Racket](https://racket-lang.org/)'s lexical directly inspired by [Racket](https://racket-lang.org/)'s lexical
syntax. syntax.