Introduce the notion of a "delimiter" to follow Boolean and SymbolOrNumber.
This commit is contained in:
parent
55deeea343
commit
5b8c07cb3f
|
@ -14,4 +14,4 @@ defaults:
|
||||||
|
|
||||||
title: "Preserves"
|
title: "Preserves"
|
||||||
version_date: "October 2023"
|
version_date: "October 2023"
|
||||||
version: "0.990.0"
|
version: "0.990.1"
|
||||||
|
|
|
@ -23,14 +23,29 @@ ABNF allows easy definition of US-ASCII-based languages. However,
|
||||||
Preserves is a Unicode-based language. Therefore, we reinterpret ABNF as
|
Preserves is a Unicode-based language. Therefore, we reinterpret ABNF as
|
||||||
a grammar for recognising sequences of Unicode scalar values.
|
a grammar for recognising sequences of Unicode scalar values.
|
||||||
|
|
||||||
|
<a id="encoding"></a>
|
||||||
**Encoding.** Textual syntax for a `Value` *SHOULD* be encoded using
|
**Encoding.** Textual syntax for a `Value` *SHOULD* be encoded using
|
||||||
UTF-8 where possible.
|
UTF-8 where possible.
|
||||||
|
|
||||||
|
<a id="whitespace"></a>
|
||||||
**Whitespace.** Whitespace is defined as any number of spaces, tabs,
|
**Whitespace.** Whitespace is defined as any number of spaces, tabs,
|
||||||
carriage returns, line feeds, or commas.
|
carriage returns, line feeds, or commas.
|
||||||
|
|
||||||
ws = *(%x20 / %x09 / CR / LF / ",")
|
ws = *(%x20 / %x09 / CR / LF / ",")
|
||||||
|
|
||||||
|
<a id="delimiters"></a>
|
||||||
|
**Delimiters.** Some tokens (`Boolean`, `SymbolOrNumber`) *MUST* be
|
||||||
|
followed by a `delimiter` or by the end of the input.[^delimiters-lookahead]
|
||||||
|
|
||||||
|
delimiter = ws
|
||||||
|
/ "<" / ">" / "[" / "]" / "{" / "}"
|
||||||
|
/ "#" / ":" / DQUOTE / "|" / "@" / ";"
|
||||||
|
|
||||||
|
[^delimiters-lookahead]: The addition of this constraint means that
|
||||||
|
implementations must now use some kind of lookahead to make sure a
|
||||||
|
delimiter follows a `Boolean`; this should not be onerous, as
|
||||||
|
something similar is required to read `SymbolOrNumber`s correctly.
|
||||||
|
|
||||||
## Grammar
|
## Grammar
|
||||||
|
|
||||||
Standalone documents may have trailing whitespace.
|
Standalone documents may have trailing whitespace.
|
||||||
|
|
|
@ -109,7 +109,7 @@ label, then by field sequence.
|
||||||
labels as specially-formatted lists.
|
labels as specially-formatted lists.
|
||||||
|
|
||||||
[^iri-labels]: It is occasionally (but seldom) necessary to
|
[^iri-labels]: It is occasionally (but seldom) necessary to
|
||||||
interpret such `Symbol` labels as UTF-8 encoded IRIs. Where a
|
interpret such `Symbol` labels as IRIs. Where a
|
||||||
label can be read as a relative IRI, it is notionally interpreted
|
label can be read as a relative IRI, it is notionally interpreted
|
||||||
with respect to the IRI
|
with respect to the IRI
|
||||||
`urn:uuid:6bf094a6-20f1-4887-ada7-46834a9b5b34`; where a label can
|
`urn:uuid:6bf094a6-20f1-4887-ada7-46834a9b5b34`; where a label can
|
||||||
|
|
12
questions.md
12
questions.md
|
@ -5,9 +5,16 @@ title: "Open questions"
|
||||||
Q. Should "symbols" instead be URIs? Relative, usually; relative to
|
Q. Should "symbols" instead be URIs? Relative, usually; relative to
|
||||||
what? Some domain-specific base URI?
|
what? Some domain-specific base URI?
|
||||||
|
|
||||||
|
> No. They may be interpreted as URIs, of course; see
|
||||||
|
> [here](preserves.html#fn:iri-labels).
|
||||||
|
|
||||||
Q. Literal small integers: are they pulling their weight? They're not
|
Q. Literal small integers: are they pulling their weight? They're not
|
||||||
absolutely necessary.
|
absolutely necessary.
|
||||||
|
|
||||||
|
> No. They were removed in the simplification of the syntax that was the
|
||||||
|
> outcome of [issue
|
||||||
|
> 41](https://gitlab.com/preserves/preserves/-/issues/41).
|
||||||
|
|
||||||
Q. Should we go for trying to make the data ordering line up with the
|
Q. Should we go for trying to make the data ordering line up with the
|
||||||
encoding ordering? We'd have to only use streaming forms, and avoid
|
encoding ordering? We'd have to only use streaming forms, and avoid
|
||||||
the small integer encoding, and not store record arities, and sort
|
the small integer encoding, and not store record arities, and sort
|
||||||
|
@ -37,3 +44,8 @@ require any whitespace at all between elements of a list, making it
|
||||||
ambiguous: does `[123]` denote a single-element or a three-element
|
ambiguous: does `[123]` denote a single-element or a three-element
|
||||||
list? Compare JSON where `[1,2,3]` is unambiguously different from
|
list? Compare JSON where `[1,2,3]` is unambiguously different from
|
||||||
`[123]`.
|
`[123]`.
|
||||||
|
|
||||||
|
> With the addition of the notion of
|
||||||
|
> [delimiters](preserves-text.html#delimiters) to the text syntax, we at
|
||||||
|
> least answer the question of how `[123]` parses: it must yield a
|
||||||
|
> single-element list.
|
||||||
|
|
Loading…
Reference in New Issue