Restrict usage of commas. Bump spec version to 0.992

This commit is contained in:
Tony Garnock-Jones 2023-11-01 14:13:36 +01:00
parent e0736e03c5
commit c89147dd6a
8 changed files with 82 additions and 87 deletions

1
.gitignore vendored
View File

@ -1,4 +1,5 @@
_site/
cheatsheet.pdf
preserves-expressions.pdf
preserves-binary.pdf
preserves-schema.pdf

View File

@ -5,7 +5,8 @@ PDFS=\
preserves-text.pdf \
preserves-binary.pdf \
preserves-schema.pdf \
preserves-expressions.pdf
preserves-expressions.pdf \
cheatsheet.pdf
all: $(PDFS)

View File

@ -14,4 +14,4 @@ defaults:
title: "Preserves"
version_date: "October 2023"
version: "0.991.0"
version: "0.992.0"

View File

@ -3,14 +3,15 @@ Document := Value ws
Value := ws (Record | Collection | Atom | Embedded | Annotated)
Collection := Sequence | Dictionary | Set
Atom := Boolean | ByteString | String | QuotedSymbol | Symbol | Number
ws := (space | tab | cr | lf | `,`)*
ws := (space | tab | cr | lf)*
commas := (ws `,`)* ws
delimiter := ws | `<` | `>` | `[` | `]` | `{` | `}`
| `#` | `:` | `"` | `|` | `@` | `;`
| `#` | `:` | `"` | `|` | `@` | `;` | `,`
Record := `<` Value+ ws `>`
Sequence := `[` Value* ws `]`
Dictionary := `{` (Value ws `:` Value)* ws `}`
Set := `#{` Value* ws `}`
Sequence := `[` (commas Value)* commas `]`
Set := `#{` (commas Value)* commas `}`
Dictionary := `{` (commas Value ws `:` Value)* commas `}`
Boolean := `#t` | `#f`
ByteString := `#"` binchar* `"`

View File

@ -3,14 +3,15 @@
| *Value* | := | **ws** (*Record* &#124; *Collection* &#124; *Atom* &#124; *Embedded* &#124; *Annotated*) |
| *Collection* | := | *Sequence* &#124; *Dictionary* &#124; *Set* |
| *Atom* | := | *Boolean* &#124; *ByteString* &#124; *String* &#124; *QuotedSymbol* &#124; *Symbol* &#124; *Number* |
| **ws** | := | (**space** &#124; **tab** &#124; **cr** &#124; **lf** &#124;`,`)<sup></sup> |
| **delimiter** | := | **ws** &#124; `<` &#124; `>` &#124; `[` &#124; `]` &#124; `{` &#124; `}` &#124; `#` &#124; `:` &#124; `"` &#124; `|` &#124; `@` &#124; `;` |
| **ws** | := | (**space** &#124; **tab** &#124; **cr** &#124; **lf**)<sup></sup> |
| **commas** | := | (**ws** `,`)<sup></sup> **ws** |
| **delimiter** | := | **ws** &#124; `<` &#124; `>` &#124; `[` &#124; `]` &#124; `{` &#124; `}` &#124; `#` &#124; `:` &#124; `"` &#124; `|` &#124; `@` &#124; `;` &#124; `,` |
{:.postcard-grammar.textsyntax}
| *Record* | := | `<`*Value*<sup>+</sup> **ws**`>` |
| *Sequence* | := | `[`*Value*<sup></sup> **ws**`]` |
| *Dictionary* | := | `{` (*Value* **ws**`:`*Value*)<sup></sup> **ws**`}` |
| *Set* | := | `#{`*Value*<sup></sup> **ws**`}` |
| *Sequence* | := | `[`(**commas** *Value*)<sup></sup> **commas**`]` |
| *Set* | := | `#{`(**commas** *Value*)<sup></sup> **commas**`}` |
| *Dictionary* | := | `{` (**commas** *Value* **ws**`:`*Value*)<sup></sup> **commas**`}` |
{:.postcard-grammar.textsyntax}
| *Boolean* | := | `#t`&#124;`#f` |

View File

@ -3,14 +3,15 @@ Document := Value ws
Value := ws (Record | Collection | Atom | Embedded | Annotated)
Collection := Sequence | Dictionary | Set
Atom := Boolean | ByteString | String | QuotedSymbol | Symbol | Number
ws := (space | tab | cr | lf | `,`)*
ws := (space | tab | cr | lf)*
commas := (ws `,`)* ws
delimiter := ws | `<` | `>` | `[` | `]` | `{` | `}`
| `#` | `:` | `"` | `|` | `@` | `;`
| `#` | `:` | `"` | `|` | `@` | `;` | `,`
Record := `<` Value+ ws `>`
Sequence := `[` Value* ws `]`
Dictionary := `{` (Value ws `:` Value)* ws `}`
Set := `#{` Value* ws `}`
Sequence := `[` (commas Value)* commas `]`
Set := `#{` (commas Value)* commas `}`
Dictionary := `{` (commas Value ws `:` Value)* commas `}`
Boolean := `#t` | `#f`
ByteString := `#"` binchar* `"`

View File

@ -25,16 +25,11 @@ which (ab)use Preserves text syntax as a kind of programming notation.
The P-expression grammar includes by reference the definition of `Atom` from the
[text syntax][], as well as the definitions that `Atom` depends on.
P-expressions take their own approach to inter-token whitespace,
however.
<a id="whitespace">
**Whitespace.** Whitespace `sp` is defined as any number of spaces,
tabs, carriage returns, or line feeds. Commas are *not* considered
whitespace in P-expressions, and so class `sp` is different to class
`ws` from the text syntax.
**Whitespace.** Whitespace `ws` is, as in the text syntax, defined as
any number of spaces, tabs, carriage returns, or line feeds.
sp = *(%x20 / %x09 / CR / LF)
ws = *(%x20 / %x09 / CR / LF)
No changes to [the Preserves semantic model](preserves.html) are made.
Every Preserves text-syntax term can be parsed as a valid P-expression,
@ -47,20 +42,22 @@ below](#reading-preserves)).
Standalone documents containing P-expressions are sequences of
individual `Expr`s, followed by trailing whitespace.
Document = *Expr sp
Document = *Expr ws
A single P-expression `Expr` can be an `Atom` from the [text syntax][],
a compound expression, special punctuation, an `Embedded` expression, or
an `Annotated` expression.
an `Annotated` expression. The class `SimpleExpr` includes all of `Expr`
except special punctuation.
Expr = sp (Atom | Compound | Punct | Embedded | Annotated)
Expr = ws (SimpleExpr | Punct)
SimpleExpr = Atom | Compound | Embedded | Annotated
Embedded and annotated values are as in the text syntax, differing only
in that uses of `Value` are replaced with `Expr`.
in that uses of `Value` are replaced with `SimpleExpr`.
Embedded = "#!" Expr
Annotated = Annotation Expr
Annotation = "@" Expr / "#" [(%x20 / %x09) linecomment] (CR / LF)
Embedded = "#!" SimpleExpr
Annotated = Annotation SimpleExpr
Annotation = "@" SimpleExpr / "#" [(%x20 / %x09) linecomment] (CR / LF)
P-expression special punctuation marks are comma, semicolon, and sequences of one or more colons.
@ -70,11 +67,11 @@ Compound expressions are sequences of `Expr`s with optional trailing
`Annotation`s, surrounded by various kinds of parentheses.
Compound = Sequence / Record / Block / Group / Set
Sequence = "[" *Expr Trailer sp "]"
Record = "<" *Expr Trailer sp ">"
Block = "{" *Expr Trailer sp "}"
Group = "(" *Expr Trailer sp ")"
Set = "#{" *Expr Trailer sp "}"
Sequence = "[" *Expr Trailer ws "]"
Record = "<" *Expr Trailer ws ">"
Block = "{" *Expr Trailer ws "}"
Group = "(" *Expr Trailer ws ")"
Set = "#{" *Expr Trailer ws "}"
In an `Annotated` P-expression, annotations and comments attach to the
term following them, just as in the ordinary text syntax. However, it is
@ -117,9 +114,8 @@ sequences of Preserves values.
The [previous section](#encoding-pexprs) discussed ways of representing
P-expressions using Preserves. Here, we discuss *interpreting*
P-expressions *as* Preserves so that (1) a Preserves datum (2) written
using Preserves text syntax[^careful-use-of-commas] and then (3) read as
a P-expression can be (4) interpreted from that P-expression to yield
the original datum.
using Preserves text syntax and then (3) read as a P-expression can be
(4) interpreted from that P-expression to yield the original datum.
1. Every `(`..`)` or `;` that appears is an error.
2. Every `:`, `::`, `:::`, ... is an error, except in context of `Block`s as described below.
@ -127,17 +123,13 @@ the original datum.
4. Every `Trailer` that appears is an error.[^discard-trailers-instead-of-error]
5. Every `Record` with no values in it is an error.
6. Every `Block` must contain zero or more repeating triplets of
`Expr`, `:`, `Expr`. Any `Block` not following this pattern is an
error. Each `Block` following the pattern is translated to a
`Dictionary` containing a key/value pair for each triplet. Any
`Block` with duplicate keys (under interpretation) is an error.
7. Every `Set` containing any duplicate expressions (under interpretation) is an error.
[^careful-use-of-commas]: Every Preserves datum can be read via a
P-expression reader and then interpreted successfully as Preserves
*if commas are omitted entirely in the text*. If commas are present,
however, they must not appear in certain positions, namely: either
before or after *p* in `@`*p* *q*; or before *p* in `#!`*p*.
`SimpleExpr`, `:`, `SimpleExpr`. Any `Block` not following this
pattern is an error. Each `Block` following the pattern is
translated to a `Dictionary` containing a key/value pair for each
triplet. Any `Block` with duplicate keys (under interpretation) is
an error.
7. Every `Set` containing any duplicate expressions (under
interpretation) is an error.
[^discard-trailers-instead-of-error]: **Implementation note.** When
implementing parsing of P-expressions into Preserves, consider
@ -283,35 +275,31 @@ P-expression `Expr`s.
## <a id="reading-preserves-equationally"></a>Appendix: Equations for interpreting P-expressions as Preserves
The partial function **uncomma**(*p*) removes all occurrences of `,`
from a P-expression *p*.
The function **uncomma**(*p*) removes all occurrences of `,` from a
P-expression *p*`Expr` {`,`}.
{:.pseudocode.equations}
| **uncomma** : **Expr** | ⇀ | **Expr** | |
| **uncomma**(`[`*p* ...`]`) | = | `[`**uncomma**(*p*) ...`]` | omitting any *p* = `,` |
| **uncomma**(`<`*p* ...`>`) | = | `<`**uncomma**(*p*) ...`>` | omitting any *p* = `,` |
| **uncomma**(`{`*p* ...`}`) | = | `{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
| **uncomma**(`(`*p* ...`)`) | = | `(`**uncomma**(*p*) ...`)` | omitting any *p* = `,` |
| **uncomma**(`#{`*p* ...`}`) | = | `#{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
| **uncomma**(`#!`*p*) | = | `#!`**uncomma**(*p*) ...`}` | |
| **uncomma**(`@`*p* *q*) | = | `@`**uncomma**(*p*) **uncomma**(*q*) | |
| **uncomma**(*p*) | = | *p* | if *p***Atom** **Punct** - {`,`} |
{:.pseudocode.equations}
| **uncomma** : **Document** | ⇀ | **Document** |
| **uncomma**(*p* ...) | = | **uncomma**(*p*) ... |
| **uncomma** : **Expr** {`,`} | ⟶ | **Expr** | |
| **uncomma**(`[`*p* ...`]`) | = | `[`**uncomma**(*p*) ...`]` | omitting any *p* = `,` |
| **uncomma**(`<`*p* ...`>`) | = | `<`**uncomma**(*p*) ...`>` | omitting any *p* = `,` |
| **uncomma**(`{`*p* ...`}`) | = | `{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
| **uncomma**(`(`*p* ...`)`) | = | `(`**uncomma**(*p*) ...`)` | omitting any *p* = `,` |
| **uncomma**(`#{`*p* ...`}`) | = | `#{`**uncomma**(*p*) ...`}` | omitting any *p* = `,` |
| **uncomma**(`#!`*p*) | = | `#!`**uncomma**(*p*) | |
| **uncomma**(`@`*p* *q*) | = | `@`**uncomma**(*p*) **uncomma**(*q*) | |
| **uncomma**(*p*) | = | *p* | if *p***Atom** **Punct** {`,`} |
We write ⌞**uncomma**(*p*)⌟ for the partial function mapping a
P-expression *p*`Expr` to a corresponding Preserves `Value`.
P-expression *p*`Expr` {`,`} to a corresponding Preserves `Value`.
{:.pseudocode.equations}
| ⌞·⌟ : **Expr** | ⇀ | **Value** | |
| ⌞`[`*p* ...`]`⌟ | = | `[`⌞*p*⌟ ...`]` | |
| ⌞`<` *p* ...`>`⌟ | = | `<`⌞ℓ⌟ ⌞*p*⌟ ...`>` | |
| ⌞`{`*k*`:`*v* ...`}`⌟ | = | `{`⌞*k*⌟`:`⌞*v*⌟ ...`}` | if all ⌞*k*⌟ ... are distinct |
| ⌞`#{`*p* ...`}`⌟ | = | `#{`⌞*p*⌟ ...`}` | if all ⌞*p*⌟ ... are distinct |
| ⌞`#!`*p*⌟ | = | `#!`⌞*p*⌟ | |
| ⌞`@`*p* *q*⌟ | = | `@`⌞*p*⌟ ⌞*q*⌟ | |
| ⌞*p*⌟ | = | *p* | when *p***Atom** |
| ⌞·⌟ : **Expr** {`,`} | ⇀ | **Value** | |
| ⌞`[`*p* ...`]`⌟ | = | `[`⌞*p*⌟ ...`]` | |
| ⌞`<` *p* ...`>`⌟ | = | `<`⌞ℓ⌟ ⌞*p*⌟ ...`>` | |
| ⌞`{`*k*`:`*v* ...`}`⌟ | = | `{`⌞*k*⌟`:`⌞*v*⌟ ...`}` | if all ⌞*k*⌟ ... are distinct |
| ⌞`#{`*p* ...`}`⌟ | = | `#{`⌞*p*⌟ ...`}` | if all ⌞*p*⌟ ... are distinct |
| ⌞`#!`*p*⌟ | = | `#!`⌞*p*⌟ | |
| ⌞`@`*p* *q* | = | `@`⌞*p*⌟ ⌞*q*⌟ | |
| ⌞*p*⌟ | = | *p* | when *p***Atom** |
## Notes

View File

@ -28,10 +28,15 @@ a grammar for recognising sequences of Unicode scalar values.
UTF-8 where possible.
<a id="whitespace"></a>
**Whitespace.** Whitespace is defined as any number of spaces, tabs,
carriage returns, line feeds, or commas.
**Whitespace.** Whitespace `ws` is defined as any number of spaces, tabs,
carriage returns, or line feeds.
ws = *(%x20 / %x09 / CR / LF / ",")
ws = *(%x20 / %x09 / CR / LF)
<a id="commas"></a>
**Commas.** In some positions inside compound terms, commas are permitted and ignored.
commas = *(ws ",") ws
<a id="delimiters"></a>
**Delimiters.** Some tokens (`Boolean`, `SymbolOrNumber`) *MUST* be
@ -39,7 +44,7 @@ followed by a `delimiter` or by the end of the input.[^delimiters-lookahead]
delimiter = ws
/ "<" / ">" / "[" / "]" / "{" / "}"
/ "#" / ":" / DQUOTE / "|" / "@" / ";"
/ "#" / ":" / DQUOTE / "|" / "@" / ";" / ","
[^delimiters-lookahead]: The addition of this constraint means that
implementations must now use some kind of lookahead to make sure a
@ -73,9 +78,9 @@ printing sets and dictionaries, implementations *SHOULD* order elements
resp. keys with respect to the [total order over
`Value`s](preserves.html#total-order).[^rationale-print-ordering]
Sequence = "[" *Value ws "]"
Set = "#{" *Value ws "}"
Dictionary = "{" *(Value ws ":" Value) ws "}"
Sequence = "[" *(commas Value) commas "]"
Set = "#{" *(commas Value) commas "}"
Dictionary = "{" *(commas Value ws ":" Value) commas "}"
[^printing-collections]: **Implementation note.** When implementing
printing of `Value`s using the textual syntax, consider supporting
@ -147,8 +152,8 @@ following the usual rules for double quote and backslash.
/ %s"\x" 2HEXDIG
/ "\" DQUOTE
The second is a sequence of pairs of hexadecimal digits interleaved
with whitespace and surrounded by `#x"` and `"`.
The second is pairs of hexadecimal digits interleaved with whitespace
and surrounded by `#x"` and `"`.
ByteString =/ %s"#x" DQUOTE *(ws 2HEXDIG) ws DQUOTE
@ -320,9 +325,6 @@ give an illusion of progress.
## Acknowledgements
The treatment of commas as whitespace in the text syntax is inspired
by the same feature of [EDN](https://github.com/edn-format/edn).
The text syntax for `Boolean`s, `Symbol`s, and `ByteString`s is
directly inspired by [Racket](https://racket-lang.org/)'s lexical
syntax.