Introduce the notion of a "delimiter" to follow Boolean and SymbolOrNumber.

This commit is contained in:
Tony Garnock-Jones 2023-10-29 15:55:19 +01:00
parent 55deeea343
commit 5b8c07cb3f
4 changed files with 29 additions and 2 deletions

View File

@ -14,4 +14,4 @@ defaults:
title: "Preserves"
version_date: "October 2023"
version: "0.990.0"
version: "0.990.1"

View File

@ -23,14 +23,29 @@ ABNF allows easy definition of US-ASCII-based languages. However,
Preserves is a Unicode-based language. Therefore, we reinterpret ABNF as
a grammar for recognising sequences of Unicode scalar values.
<a id="encoding"></a>
**Encoding.** Textual syntax for a `Value` *SHOULD* be encoded using
UTF-8 where possible.
<a id="whitespace"></a>
**Whitespace.** Whitespace is defined as any number of spaces, tabs,
carriage returns, line feeds, or commas.
ws = *(%x20 / %x09 / CR / LF / ",")
<a id="delimiters"></a>
**Delimiters.** Some tokens (`Boolean`, `SymbolOrNumber`) *MUST* be
followed by a `delimiter` or by the end of the input.[^delimiters-lookahead]
delimiter = ws
/ "<" / ">" / "[" / "]" / "{" / "}"
/ "#" / ":" / DQUOTE / "|" / "@" / ";"
[^delimiters-lookahead]: The addition of this constraint means that
implementations must now use some kind of lookahead to make sure a
delimiter follows a `Boolean`; this should not be onerous, as
something similar is required to read `SymbolOrNumber`s correctly.
## Grammar
Standalone documents may have trailing whitespace.

View File

@ -109,7 +109,7 @@ label, then by field sequence.
labels as specially-formatted lists.
[^iri-labels]: It is occasionally (but seldom) necessary to
interpret such `Symbol` labels as UTF-8 encoded IRIs. Where a
interpret such `Symbol` labels as IRIs. Where a
label can be read as a relative IRI, it is notionally interpreted
with respect to the IRI
`urn:uuid:6bf094a6-20f1-4887-ada7-46834a9b5b34`; where a label can

View File

@ -5,9 +5,16 @@ title: "Open questions"
Q. Should "symbols" instead be URIs? Relative, usually; relative to
what? Some domain-specific base URI?
> No. They may be interpreted as URIs, of course; see
> [here](preserves.html#fn:iri-labels).
Q. Literal small integers: are they pulling their weight? They're not
absolutely necessary.
> No. They were removed in the simplification of the syntax that was the
> outcome of [issue
> 41](https://gitlab.com/preserves/preserves/-/issues/41).
Q. Should we go for trying to make the data ordering line up with the
encoding ordering? We'd have to only use streaming forms, and avoid
the small integer encoding, and not store record arities, and sort
@ -37,3 +44,8 @@ require any whitespace at all between elements of a list, making it
ambiguous: does `[123]` denote a single-element or a three-element
list? Compare JSON where `[1,2,3]` is unambiguously different from
`[123]`.
> With the addition of the notion of
> [delimiters](preserves-text.html#delimiters) to the text syntax, we at
> least answer the question of how `[123]` parses: it must yield a
> single-element list.