From 9cc537abf8c6c458f7e08a771d949f785703f091 Mon Sep 17 00:00:00 2001 From: Tony Garnock-Jones Date: Tue, 17 Oct 2023 10:40:23 +0200 Subject: [PATCH] Feed back clarifications from the cheatsheet version of the text grammar --- preserves-text.md | 58 ++++++++++++++++++++--------------------------- 1 file changed, 24 insertions(+), 34 deletions(-) diff --git a/preserves-text.md b/preserves-text.md index b4b60e9..2450eaf 100644 --- a/preserves-text.md +++ b/preserves-text.md @@ -29,8 +29,7 @@ UTF-8 where possible. **Whitespace.** Whitespace is defined as any number of spaces, tabs, carriage returns, line feeds, or commas. - ws = *(%x20 / %x09 / newline / ",") - newline = CR / LF + ws = *(%x20 / %x09 / CR / LF / ",") ## Grammar @@ -90,23 +89,15 @@ the same as for JSON,[^string-json-correspondence] [surrogate code points](https://unicode.org/glossary/#surrogate_code_point) *MUST NOT* be generated or accepted.[^unpaired-surrogates] - String = %x22 *char %x22 - char = unescaped / %x7C / escape (escaped / %x22 / %s"u" 4HEXDIG) - unescaped = %x20-21 / %x23-5B / %x5D-7B / %x7D-10FFFF - escape = %x5C ; \ - escaped = ( %x5C / ; \ reverse solidus U+005C - %x2F / ; / solidus U+002F - %x62 / ; b backspace U+0008 - %x66 / ; f form feed U+000C - %x6E / ; n line feed U+000A - %x72 / ; r carriage return U+000D - %x74 ) ; t tab U+0009 + String = DQUOTE *char DQUOTE + char = / escaped / "\" DQUOTE + escaped = "\\" / "\/" / %s"\b" / %s"\f" / %s"\n" / %s"\r" / %s"\t" + / %s"\u" 4HEXDIG [^string-json-correspondence]: The grammar for `String` has the same effect as the [JSON](https://tools.ietf.org/html/rfc8259#section-7) grammar for - `string`. Some auxiliary definitions (e.g. `escaped`) are lifted - largely unmodified from the text of RFC 8259. + `string`. [^escaping-surrogate-pairs]: In particular, note JSON's rules around the use of surrogate pairs for scalar values not in the Basic @@ -135,14 +126,16 @@ Many bytes map directly to printable 7-bit ASCII; the remainder must be escaped, either as `\x` followed by a two-digit hexadecimal number, or following the usual rules for double quote and backslash. - ByteString = "#" %x22 *binchar %x22 - binchar = binunescaped / escape (escaped / %x22 / %s"x" 2HEXDIG) - binunescaped = %x20-21 / %x23-5B / %x5D-7E + ByteString = "#" DQUOTE *binchar DQUOTE + binchar = + / "\" ("\" / "/" / %s"b" / %s"f" / %s"n" / %s"r" / %s"t") + / %s"\x" 2HEXDIG + / "\" DQUOTE The second is a sequence of pairs of hexadecimal digits interleaved with whitespace and surrounded by `#x"` and `"`. - ByteString =/ %s"#x" %x22 *(ws / 2HEXDIG) ws %x22 + ByteString =/ %s"#x" DQUOTE *(ws 2HEXDIG) ws DQUOTE The third is a sequence of [Base64](https://tools.ietf.org/html/rfc4648) characters, interleaved with whitespace and surrounded by `#[` and `]`. @@ -153,8 +146,8 @@ and [URL-safe](https://datatracker.ietf.org/doc/html/rfc4648#section-5) (`-`,`_`) characters *SHOULD* be generated by default. Padding characters (`=`) may be omitted. - ByteString =/ "#[" *(ws / base64char) ws "]" - base64char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "-" / "_" / "=" + ByteString =/ "#[" *(ws base64char) ws "]" + base64char = ALPHA / DIGIT / "+" / "/" / "-" / "_" / "=" A `Symbol` may be written in either of two forms. @@ -163,7 +156,7 @@ including embedded escape syntax, except using a bar or pipe character (`|`) instead of a double quote mark. QuotedSymbol = "|" *symchar "|" - symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG) + symchar = / escaped / "\|" Alternatively, a `Symbol` may be written in a “bare” form[^cf-sexp-token]. The grammar for numeric data is a subset of the grammar for bare `Symbol`s, @@ -171,12 +164,11 @@ so if a `SymbolOrNumber` also matches the grammar for `Float`, `Double` or `SignedInteger`, then it must be interpreted as one of those, and otherwise it must be interpreted as a bare `Symbol`. - SymbolOrNumber = 1*baresymchar - baresymchar = ALPHA / DIGIT / sympunct / symuchar + SymbolOrNumber = 1*(ALPHA / DIGIT / sympunct / symuchar) sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" / "?" / "_" / "=" / "+" / "-" / "/" / "." - symuchar = [^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt] @@ -239,9 +231,8 @@ represented as raw hexadecimal strings similar to hexadecimal syntax whereever convenient, even for values representable using the grammar above.[^rationale-no-general-machine-syntax] - Value =/ HexFloat / HexDouble - HexFloat = "#xf" %x22 4(ws 2HEXDIG) ws %x22 - HexDouble = "#xd" %x22 8(ws 2HEXDIG) ws %x22 + Float =/ "#xf" DQUOTE 4(ws 2HEXDIG) ws DQUOTE + Double =/ "#xd" DQUOTE 8(ws 2HEXDIG) ws DQUOTE [^rationale-no-general-machine-syntax]: **Rationale.** Previous versions of this specification included an escape to the [machine-oriented @@ -277,12 +268,11 @@ named “`Value`” without altering the semantic class of `Value`s. interpreted as comments associated with that value. Comments are sufficiently common that special syntax exists for them. - Value =/ ws - ";" *(%x00-09 / %x0B-0C / %x0E-10FFFF) newline - Value + Value =/ ws ";" linecomment (CR / LF) Value + linecomment = * -When written this way, everything between the `;` and the newline is -included in the string annotating the `Value`. +When written this way, everything between the `;` and the end of the line +is included in the string annotating the `Value`. **Equivalence.** Annotations appear within syntax denoting a `Value`; however, the annotations are not part of the denoted value. They are