Feed back clarifications from the cheatsheet version of the text grammar

2023-10-17 10:40:23 +02:00 · 2023-10-17 10:40:23 +02:00 · 9cc537abf8
parent 6a56dad886
commit 9cc537abf8
1 changed files with 24 additions and 34 deletions
--- a/preserves-text.md
+++ b/preserves-text.md
@ -29,8 +29,7 @@ UTF-8 where possible.
 **Whitespace.** Whitespace is defined as any number of spaces, tabs,
 carriage returns, line feeds, or commas.

-                ws = *(%x20 / %x09 / newline / ",")
-           newline = CR / LF
+                ws = *(%x20 / %x09 / CR / LF / ",")

 ## Grammar

@ -90,23 +89,15 @@ the same as for JSON,[^string-json-correspondence]
 [surrogate code points](https://unicode.org/glossary/#surrogate_code_point)
 *MUST NOT* be generated or accepted.[^unpaired-surrogates]

-            String = %x22 *char %x22
-              char = unescaped / %x7C / escape (escaped / %x22 / %s"u" 4HEXDIG)
-         unescaped = %x20-21 / %x23-5B / %x5D-7B / %x7D-10FFFF
-            escape = %x5C              ; \
-           escaped = ( %x5C /          ; \    reverse solidus U+005C
-                       %x2F /          ; /    solidus         U+002F
-                       %x62 /          ; b    backspace       U+0008
-                       %x66 /          ; f    form feed       U+000C
-                       %x6E /          ; n    line feed       U+000A
-                       %x72 /          ; r    carriage return U+000D
-                       %x74 )          ; t    tab             U+0009
+            String = DQUOTE *char DQUOTE
+              char = <any unicode scalar value except "\" or DQUOTE> / escaped / "\" DQUOTE
+           escaped = "\\" / "\/" / %s"\b" / %s"\f" / %s"\n" / %s"\r" / %s"\t"
+                   / %s"\u" 4HEXDIG

  [^string-json-correspondence]: The grammar for `String` has the same
    effect as the
    [JSON](https://tools.ietf.org/html/rfc8259#section-7) grammar for
-    `string`. Some auxiliary definitions (e.g. `escaped`) are lifted
-    largely unmodified from the text of RFC 8259.
+    `string`.

  [^escaping-surrogate-pairs]: In particular, note JSON's rules around
    the use of surrogate pairs for scalar values not in the Basic
@ -135,14 +126,16 @@ Many bytes map directly to printable 7-bit ASCII; the remainder must be
 escaped, either as `\x` followed by a two-digit hexadecimal number, or
 following the usual rules for double quote and backslash.

-        ByteString = "#" %x22 *binchar %x22
-           binchar = binunescaped / escape (escaped / %x22 / %s"x" 2HEXDIG)
-      binunescaped = %x20-21 / %x23-5B / %x5D-7E
+        ByteString = "#" DQUOTE *binchar DQUOTE
+           binchar = <any unicode scalar value ≥32 and ≤126 except "\" or DQUOTE>
+                   / "\" ("\" / "/" / %s"b" / %s"f" / %s"n" / %s"r" / %s"t")
+                   / %s"\x" 2HEXDIG
+                   / "\" DQUOTE

 The second is a sequence of pairs of hexadecimal digits interleaved
 with whitespace and surrounded by `#x"` and `"`.

-       ByteString =/ %s"#x" %x22 *(ws / 2HEXDIG) ws %x22
+       ByteString =/ %s"#x" DQUOTE *(ws 2HEXDIG) ws DQUOTE

 The third is a sequence of [Base64](https://tools.ietf.org/html/rfc4648)
 characters, interleaved with whitespace and surrounded by `#[` and `]`.
@ -153,8 +146,8 @@ and [URL-safe](https://datatracker.ietf.org/doc/html/rfc4648#section-5)
 (`-`,`_`) characters *SHOULD* be generated by default. Padding characters
 (`=`) may be omitted.

-       ByteString =/ "#[" *(ws / base64char) ws "]"
-        base64char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "-" / "_" / "="
+       ByteString =/ "#[" *(ws base64char) ws "]"
+        base64char = ALPHA / DIGIT / "+" / "/" / "-" / "_" / "="

 A `Symbol` may be written in either of two forms.

@ -163,7 +156,7 @@ including embedded escape syntax, except using a bar or pipe character
 (`|`) instead of a double quote mark.

      QuotedSymbol = "|" *symchar "|"
-           symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
+           symchar = <any unicode scalar value except "\" or "|"> / escaped / "\|"

 Alternatively, a `Symbol` may be written in a “bare” form[^cf-sexp-token].
 The grammar for numeric data is a subset of the grammar for bare `Symbol`s,
@ -171,12 +164,11 @@ so if a `SymbolOrNumber` also matches the grammar for `Float`, `Double` or
 `SignedInteger`, then it must be interpreted as one of those, and otherwise
 it must be interpreted as a bare `Symbol`.

-    SymbolOrNumber = 1*baresymchar
-       baresymchar = ALPHA / DIGIT / sympunct / symuchar
+    SymbolOrNumber = 1*(ALPHA / DIGIT / sympunct / symuchar)
          sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
                     "?" / "_" / "=" / "+" / "-" / "/" / "."
-          symuchar = <any scalar value greater than 127 whose Unicode
-                      category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
+          symuchar = <any scalar value ≥128 whose Unicode category is
+                      Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
                      Nl, No, Pc, Pd, Po, Sc, Sm, Sk, So, or Co>

  [^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt]
@ -239,9 +231,8 @@ represented as raw hexadecimal strings similar to hexadecimal
 syntax whereever convenient, even for values representable using the
 grammar above.[^rationale-no-general-machine-syntax]

-            Value =/ HexFloat / HexDouble
-          HexFloat = "#xf" %x22 4(ws 2HEXDIG) ws %x22
-         HexDouble = "#xd" %x22 8(ws 2HEXDIG) ws %x22
+            Float =/ "#xf" DQUOTE 4(ws 2HEXDIG) ws DQUOTE
+           Double =/ "#xd" DQUOTE 8(ws 2HEXDIG) ws DQUOTE

  [^rationale-no-general-machine-syntax]: **Rationale.** Previous versions
    of this specification included an escape to the [machine-oriented
@ -277,12 +268,11 @@ named “`Value`” without altering the semantic class of `Value`s.
 interpreted as comments associated with that value. Comments are
 sufficiently common that special syntax exists for them.

-            Value =/ ws
-                     ";" *(%x00-09 / %x0B-0C / %x0E-10FFFF) newline
-                     Value
+            Value =/ ws ";" linecomment (CR / LF) Value
+       linecomment = *<any unicode scalar value except CR or LF>

-When written this way, everything between the `;` and the newline is
-included in the string annotating the `Value`.
+When written this way, everything between the `;` and the end of the line
+is included in the string annotating the `Value`.

 **Equivalence.** Annotations appear within syntax denoting a `Value`;
 however, the annotations are not part of the denoted value. They are