Feed back clarifications from the cheatsheet version of the text grammar

This commit is contained in:
Tony Garnock-Jones 2023-10-17 10:40:23 +02:00
parent 6a56dad886
commit 9cc537abf8
1 changed files with 24 additions and 34 deletions

View File

@ -29,8 +29,7 @@ UTF-8 where possible.
**Whitespace.** Whitespace is defined as any number of spaces, tabs,
carriage returns, line feeds, or commas.
ws = *(%x20 / %x09 / newline / ",")
newline = CR / LF
ws = *(%x20 / %x09 / CR / LF / ",")
## Grammar
@ -90,23 +89,15 @@ the same as for JSON,[^string-json-correspondence]
[surrogate code points](https://unicode.org/glossary/#surrogate_code_point)
*MUST NOT* be generated or accepted.[^unpaired-surrogates]
String = %x22 *char %x22
char = unescaped / %x7C / escape (escaped / %x22 / %s"u" 4HEXDIG)
unescaped = %x20-21 / %x23-5B / %x5D-7B / %x7D-10FFFF
escape = %x5C ; \
escaped = ( %x5C / ; \ reverse solidus U+005C
%x2F / ; / solidus U+002F
%x62 / ; b backspace U+0008
%x66 / ; f form feed U+000C
%x6E / ; n line feed U+000A
%x72 / ; r carriage return U+000D
%x74 ) ; t tab U+0009
String = DQUOTE *char DQUOTE
char = <any unicode scalar value except "\" or DQUOTE> / escaped / "\" DQUOTE
escaped = "\\" / "\/" / %s"\b" / %s"\f" / %s"\n" / %s"\r" / %s"\t"
/ %s"\u" 4HEXDIG
[^string-json-correspondence]: The grammar for `String` has the same
effect as the
[JSON](https://tools.ietf.org/html/rfc8259#section-7) grammar for
`string`. Some auxiliary definitions (e.g. `escaped`) are lifted
largely unmodified from the text of RFC 8259.
`string`.
[^escaping-surrogate-pairs]: In particular, note JSON's rules around
the use of surrogate pairs for scalar values not in the Basic
@ -135,14 +126,16 @@ Many bytes map directly to printable 7-bit ASCII; the remainder must be
escaped, either as `\x` followed by a two-digit hexadecimal number, or
following the usual rules for double quote and backslash.
ByteString = "#" %x22 *binchar %x22
binchar = binunescaped / escape (escaped / %x22 / %s"x" 2HEXDIG)
binunescaped = %x20-21 / %x23-5B / %x5D-7E
ByteString = "#" DQUOTE *binchar DQUOTE
binchar = <any unicode scalar value 32 and 126 except "\" or DQUOTE>
/ "\" ("\" / "/" / %s"b" / %s"f" / %s"n" / %s"r" / %s"t")
/ %s"\x" 2HEXDIG
/ "\" DQUOTE
The second is a sequence of pairs of hexadecimal digits interleaved
with whitespace and surrounded by `#x"` and `"`.
ByteString =/ %s"#x" %x22 *(ws / 2HEXDIG) ws %x22
ByteString =/ %s"#x" DQUOTE *(ws 2HEXDIG) ws DQUOTE
The third is a sequence of [Base64](https://tools.ietf.org/html/rfc4648)
characters, interleaved with whitespace and surrounded by `#[` and `]`.
@ -153,8 +146,8 @@ and [URL-safe](https://datatracker.ietf.org/doc/html/rfc4648#section-5)
(`-`,`_`) characters *SHOULD* be generated by default. Padding characters
(`=`) may be omitted.
ByteString =/ "#[" *(ws / base64char) ws "]"
base64char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "-" / "_" / "="
ByteString =/ "#[" *(ws base64char) ws "]"
base64char = ALPHA / DIGIT / "+" / "/" / "-" / "_" / "="
A `Symbol` may be written in either of two forms.
@ -163,7 +156,7 @@ including embedded escape syntax, except using a bar or pipe character
(`|`) instead of a double quote mark.
QuotedSymbol = "|" *symchar "|"
symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
symchar = <any unicode scalar value except "\" or "|"> / escaped / "\|"
Alternatively, a `Symbol` may be written in a “bare” form[^cf-sexp-token].
The grammar for numeric data is a subset of the grammar for bare `Symbol`s,
@ -171,12 +164,11 @@ so if a `SymbolOrNumber` also matches the grammar for `Float`, `Double` or
`SignedInteger`, then it must be interpreted as one of those, and otherwise
it must be interpreted as a bare `Symbol`.
SymbolOrNumber = 1*baresymchar
baresymchar = ALPHA / DIGIT / sympunct / symuchar
SymbolOrNumber = 1*(ALPHA / DIGIT / sympunct / symuchar)
sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
"?" / "_" / "=" / "+" / "-" / "/" / "."
symuchar = <any scalar value greater than 127 whose Unicode
category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
symuchar = <any scalar value 128 whose Unicode category is
Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
Nl, No, Pc, Pd, Po, Sc, Sm, Sk, So, or Co>
[^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt]
@ -239,9 +231,8 @@ represented as raw hexadecimal strings similar to hexadecimal
syntax whereever convenient, even for values representable using the
grammar above.[^rationale-no-general-machine-syntax]
Value =/ HexFloat / HexDouble
HexFloat = "#xf" %x22 4(ws 2HEXDIG) ws %x22
HexDouble = "#xd" %x22 8(ws 2HEXDIG) ws %x22
Float =/ "#xf" DQUOTE 4(ws 2HEXDIG) ws DQUOTE
Double =/ "#xd" DQUOTE 8(ws 2HEXDIG) ws DQUOTE
[^rationale-no-general-machine-syntax]: **Rationale.** Previous versions
of this specification included an escape to the [machine-oriented
@ -277,12 +268,11 @@ named “`Value`” without altering the semantic class of `Value`s.
interpreted as comments associated with that value. Comments are
sufficiently common that special syntax exists for them.
Value =/ ws
";" *(%x00-09 / %x0B-0C / %x0E-10FFFF) newline
Value
Value =/ ws ";" linecomment (CR / LF) Value
linecomment = *<any unicode scalar value except CR or LF>
When written this way, everything between the `;` and the newline is
included in the string annotating the `Value`.
When written this way, everything between the `;` and the end of the line
is included in the string annotating the `Value`.
**Equivalence.** Annotations appear within syntax denoting a `Value`;
however, the annotations are not part of the denoted value. They are