diff --git a/_config.yml b/_config.yml index 4397296..b97fce5 100644 --- a/_config.yml +++ b/_config.yml @@ -14,4 +14,4 @@ defaults: title: "Preserves" version_date: "October 2023" -version: "0.990.0" +version: "0.990.1" diff --git a/preserves-text.md b/preserves-text.md index 2450eaf..97b145b 100644 --- a/preserves-text.md +++ b/preserves-text.md @@ -23,14 +23,29 @@ ABNF allows easy definition of US-ASCII-based languages. However, Preserves is a Unicode-based language. Therefore, we reinterpret ABNF as a grammar for recognising sequences of Unicode scalar values. + **Encoding.** Textual syntax for a `Value` *SHOULD* be encoded using UTF-8 where possible. + **Whitespace.** Whitespace is defined as any number of spaces, tabs, carriage returns, line feeds, or commas. ws = *(%x20 / %x09 / CR / LF / ",") + +**Delimiters.** Some tokens (`Boolean`, `SymbolOrNumber`) *MUST* be +followed by a `delimiter` or by the end of the input.[^delimiters-lookahead] + + delimiter = ws + / "<" / ">" / "[" / "]" / "{" / "}" + / "#" / ":" / DQUOTE / "|" / "@" / ";" + +[^delimiters-lookahead]: The addition of this constraint means that + implementations must now use some kind of lookahead to make sure a + delimiter follows a `Boolean`; this should not be onerous, as + something similar is required to read `SymbolOrNumber`s correctly. + ## Grammar Standalone documents may have trailing whitespace. diff --git a/preserves.md b/preserves.md index 67ae0e2..6f963aa 100644 --- a/preserves.md +++ b/preserves.md @@ -109,7 +109,7 @@ label, then by field sequence. labels as specially-formatted lists. [^iri-labels]: It is occasionally (but seldom) necessary to - interpret such `Symbol` labels as UTF-8 encoded IRIs. Where a + interpret such `Symbol` labels as IRIs. Where a label can be read as a relative IRI, it is notionally interpreted with respect to the IRI `urn:uuid:6bf094a6-20f1-4887-ada7-46834a9b5b34`; where a label can diff --git a/questions.md b/questions.md index a441795..f612f59 100644 --- a/questions.md +++ b/questions.md @@ -5,9 +5,16 @@ title: "Open questions" Q. Should "symbols" instead be URIs? Relative, usually; relative to what? Some domain-specific base URI? +> No. They may be interpreted as URIs, of course; see +> [here](preserves.html#fn:iri-labels). + Q. Literal small integers: are they pulling their weight? They're not absolutely necessary. +> No. They were removed in the simplification of the syntax that was the +> outcome of [issue +> 41](https://gitlab.com/preserves/preserves/-/issues/41). + Q. Should we go for trying to make the data ordering line up with the encoding ordering? We'd have to only use streaming forms, and avoid the small integer encoding, and not store record arities, and sort @@ -37,3 +44,8 @@ require any whitespace at all between elements of a list, making it ambiguous: does `[123]` denote a single-element or a three-element list? Compare JSON where `[1,2,3]` is unambiguously different from `[123]`. + +> With the addition of the notion of +> [delimiters](preserves-text.html#delimiters) to the text syntax, we at +> least answer the question of how `[123]` parses: it must yield a +> single-element list.