Introduce the notion of a "delimiter" to follow Boolean and SymbolOrNumber.

This commit is contained in:
Tony Garnock-Jones 2023-10-29 15:55:19 +01:00
parent 55deeea343
commit 5b8c07cb3f
4 changed files with 29 additions and 2 deletions

View File

@ -14,4 +14,4 @@ defaults:
title: "Preserves" title: "Preserves"
version_date: "October 2023" version_date: "October 2023"
version: "0.990.0" version: "0.990.1"

View File

@ -23,14 +23,29 @@ ABNF allows easy definition of US-ASCII-based languages. However,
Preserves is a Unicode-based language. Therefore, we reinterpret ABNF as Preserves is a Unicode-based language. Therefore, we reinterpret ABNF as
a grammar for recognising sequences of Unicode scalar values. a grammar for recognising sequences of Unicode scalar values.
<a id="encoding"></a>
**Encoding.** Textual syntax for a `Value` *SHOULD* be encoded using **Encoding.** Textual syntax for a `Value` *SHOULD* be encoded using
UTF-8 where possible. UTF-8 where possible.
<a id="whitespace"></a>
**Whitespace.** Whitespace is defined as any number of spaces, tabs, **Whitespace.** Whitespace is defined as any number of spaces, tabs,
carriage returns, line feeds, or commas. carriage returns, line feeds, or commas.
ws = *(%x20 / %x09 / CR / LF / ",") ws = *(%x20 / %x09 / CR / LF / ",")
<a id="delimiters"></a>
**Delimiters.** Some tokens (`Boolean`, `SymbolOrNumber`) *MUST* be
followed by a `delimiter` or by the end of the input.[^delimiters-lookahead]
delimiter = ws
/ "<" / ">" / "[" / "]" / "{" / "}"
/ "#" / ":" / DQUOTE / "|" / "@" / ";"
[^delimiters-lookahead]: The addition of this constraint means that
implementations must now use some kind of lookahead to make sure a
delimiter follows a `Boolean`; this should not be onerous, as
something similar is required to read `SymbolOrNumber`s correctly.
## Grammar ## Grammar
Standalone documents may have trailing whitespace. Standalone documents may have trailing whitespace.

View File

@ -109,7 +109,7 @@ label, then by field sequence.
labels as specially-formatted lists. labels as specially-formatted lists.
[^iri-labels]: It is occasionally (but seldom) necessary to [^iri-labels]: It is occasionally (but seldom) necessary to
interpret such `Symbol` labels as UTF-8 encoded IRIs. Where a interpret such `Symbol` labels as IRIs. Where a
label can be read as a relative IRI, it is notionally interpreted label can be read as a relative IRI, it is notionally interpreted
with respect to the IRI with respect to the IRI
`urn:uuid:6bf094a6-20f1-4887-ada7-46834a9b5b34`; where a label can `urn:uuid:6bf094a6-20f1-4887-ada7-46834a9b5b34`; where a label can

View File

@ -5,9 +5,16 @@ title: "Open questions"
Q. Should "symbols" instead be URIs? Relative, usually; relative to Q. Should "symbols" instead be URIs? Relative, usually; relative to
what? Some domain-specific base URI? what? Some domain-specific base URI?
> No. They may be interpreted as URIs, of course; see
> [here](preserves.html#fn:iri-labels).
Q. Literal small integers: are they pulling their weight? They're not Q. Literal small integers: are they pulling their weight? They're not
absolutely necessary. absolutely necessary.
> No. They were removed in the simplification of the syntax that was the
> outcome of [issue
> 41](https://gitlab.com/preserves/preserves/-/issues/41).
Q. Should we go for trying to make the data ordering line up with the Q. Should we go for trying to make the data ordering line up with the
encoding ordering? We'd have to only use streaming forms, and avoid encoding ordering? We'd have to only use streaming forms, and avoid
the small integer encoding, and not store record arities, and sort the small integer encoding, and not store record arities, and sort
@ -37,3 +44,8 @@ require any whitespace at all between elements of a list, making it
ambiguous: does `[123]` denote a single-element or a three-element ambiguous: does `[123]` denote a single-element or a three-element
list? Compare JSON where `[1,2,3]` is unambiguously different from list? Compare JSON where `[1,2,3]` is unambiguously different from
`[123]`. `[123]`.
> With the addition of the notion of
> [delimiters](preserves-text.html#delimiters) to the text syntax, we at
> least answer the question of how `[123]` parses: it must yield a
> single-element list.