Feed back clarifications from the cheatsheet version of the text grammar

This commit is contained in:
Tony Garnock-Jones 2023-10-17 10:40:23 +02:00
parent 6a56dad886
commit 9cc537abf8
1 changed files with 24 additions and 34 deletions

View File

@ -29,8 +29,7 @@ UTF-8 where possible.
**Whitespace.** Whitespace is defined as any number of spaces, tabs, **Whitespace.** Whitespace is defined as any number of spaces, tabs,
carriage returns, line feeds, or commas. carriage returns, line feeds, or commas.
ws = *(%x20 / %x09 / newline / ",") ws = *(%x20 / %x09 / CR / LF / ",")
newline = CR / LF
## Grammar ## Grammar
@ -90,23 +89,15 @@ the same as for JSON,[^string-json-correspondence]
[surrogate code points](https://unicode.org/glossary/#surrogate_code_point) [surrogate code points](https://unicode.org/glossary/#surrogate_code_point)
*MUST NOT* be generated or accepted.[^unpaired-surrogates] *MUST NOT* be generated or accepted.[^unpaired-surrogates]
String = %x22 *char %x22 String = DQUOTE *char DQUOTE
char = unescaped / %x7C / escape (escaped / %x22 / %s"u" 4HEXDIG) char = <any unicode scalar value except "\" or DQUOTE> / escaped / "\" DQUOTE
unescaped = %x20-21 / %x23-5B / %x5D-7B / %x7D-10FFFF escaped = "\\" / "\/" / %s"\b" / %s"\f" / %s"\n" / %s"\r" / %s"\t"
escape = %x5C ; \ / %s"\u" 4HEXDIG
escaped = ( %x5C / ; \ reverse solidus U+005C
%x2F / ; / solidus U+002F
%x62 / ; b backspace U+0008
%x66 / ; f form feed U+000C
%x6E / ; n line feed U+000A
%x72 / ; r carriage return U+000D
%x74 ) ; t tab U+0009
[^string-json-correspondence]: The grammar for `String` has the same [^string-json-correspondence]: The grammar for `String` has the same
effect as the effect as the
[JSON](https://tools.ietf.org/html/rfc8259#section-7) grammar for [JSON](https://tools.ietf.org/html/rfc8259#section-7) grammar for
`string`. Some auxiliary definitions (e.g. `escaped`) are lifted `string`.
largely unmodified from the text of RFC 8259.
[^escaping-surrogate-pairs]: In particular, note JSON's rules around [^escaping-surrogate-pairs]: In particular, note JSON's rules around
the use of surrogate pairs for scalar values not in the Basic the use of surrogate pairs for scalar values not in the Basic
@ -135,14 +126,16 @@ Many bytes map directly to printable 7-bit ASCII; the remainder must be
escaped, either as `\x` followed by a two-digit hexadecimal number, or escaped, either as `\x` followed by a two-digit hexadecimal number, or
following the usual rules for double quote and backslash. following the usual rules for double quote and backslash.
ByteString = "#" %x22 *binchar %x22 ByteString = "#" DQUOTE *binchar DQUOTE
binchar = binunescaped / escape (escaped / %x22 / %s"x" 2HEXDIG) binchar = <any unicode scalar value 32 and 126 except "\" or DQUOTE>
binunescaped = %x20-21 / %x23-5B / %x5D-7E / "\" ("\" / "/" / %s"b" / %s"f" / %s"n" / %s"r" / %s"t")
/ %s"\x" 2HEXDIG
/ "\" DQUOTE
The second is a sequence of pairs of hexadecimal digits interleaved The second is a sequence of pairs of hexadecimal digits interleaved
with whitespace and surrounded by `#x"` and `"`. with whitespace and surrounded by `#x"` and `"`.
ByteString =/ %s"#x" %x22 *(ws / 2HEXDIG) ws %x22 ByteString =/ %s"#x" DQUOTE *(ws 2HEXDIG) ws DQUOTE
The third is a sequence of [Base64](https://tools.ietf.org/html/rfc4648) The third is a sequence of [Base64](https://tools.ietf.org/html/rfc4648)
characters, interleaved with whitespace and surrounded by `#[` and `]`. characters, interleaved with whitespace and surrounded by `#[` and `]`.
@ -153,8 +146,8 @@ and [URL-safe](https://datatracker.ietf.org/doc/html/rfc4648#section-5)
(`-`,`_`) characters *SHOULD* be generated by default. Padding characters (`-`,`_`) characters *SHOULD* be generated by default. Padding characters
(`=`) may be omitted. (`=`) may be omitted.
ByteString =/ "#[" *(ws / base64char) ws "]" ByteString =/ "#[" *(ws base64char) ws "]"
base64char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "-" / "_" / "=" base64char = ALPHA / DIGIT / "+" / "/" / "-" / "_" / "="
A `Symbol` may be written in either of two forms. A `Symbol` may be written in either of two forms.
@ -163,7 +156,7 @@ including embedded escape syntax, except using a bar or pipe character
(`|`) instead of a double quote mark. (`|`) instead of a double quote mark.
QuotedSymbol = "|" *symchar "|" QuotedSymbol = "|" *symchar "|"
symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG) symchar = <any unicode scalar value except "\" or "|"> / escaped / "\|"
Alternatively, a `Symbol` may be written in a “bare” form[^cf-sexp-token]. Alternatively, a `Symbol` may be written in a “bare” form[^cf-sexp-token].
The grammar for numeric data is a subset of the grammar for bare `Symbol`s, The grammar for numeric data is a subset of the grammar for bare `Symbol`s,
@ -171,12 +164,11 @@ so if a `SymbolOrNumber` also matches the grammar for `Float`, `Double` or
`SignedInteger`, then it must be interpreted as one of those, and otherwise `SignedInteger`, then it must be interpreted as one of those, and otherwise
it must be interpreted as a bare `Symbol`. it must be interpreted as a bare `Symbol`.
SymbolOrNumber = 1*baresymchar SymbolOrNumber = 1*(ALPHA / DIGIT / sympunct / symuchar)
baresymchar = ALPHA / DIGIT / sympunct / symuchar
sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" / sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
"?" / "_" / "=" / "+" / "-" / "/" / "." "?" / "_" / "=" / "+" / "-" / "/" / "."
symuchar = <any scalar value greater than 127 whose Unicode symuchar = <any scalar value 128 whose Unicode category is
category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
Nl, No, Pc, Pd, Po, Sc, Sm, Sk, So, or Co> Nl, No, Pc, Pd, Po, Sc, Sm, Sk, So, or Co>
[^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt] [^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt]
@ -239,9 +231,8 @@ represented as raw hexadecimal strings similar to hexadecimal
syntax whereever convenient, even for values representable using the syntax whereever convenient, even for values representable using the
grammar above.[^rationale-no-general-machine-syntax] grammar above.[^rationale-no-general-machine-syntax]
Value =/ HexFloat / HexDouble Float =/ "#xf" DQUOTE 4(ws 2HEXDIG) ws DQUOTE
HexFloat = "#xf" %x22 4(ws 2HEXDIG) ws %x22 Double =/ "#xd" DQUOTE 8(ws 2HEXDIG) ws DQUOTE
HexDouble = "#xd" %x22 8(ws 2HEXDIG) ws %x22
[^rationale-no-general-machine-syntax]: **Rationale.** Previous versions [^rationale-no-general-machine-syntax]: **Rationale.** Previous versions
of this specification included an escape to the [machine-oriented of this specification included an escape to the [machine-oriented
@ -277,12 +268,11 @@ named “`Value`” without altering the semantic class of `Value`s.
interpreted as comments associated with that value. Comments are interpreted as comments associated with that value. Comments are
sufficiently common that special syntax exists for them. sufficiently common that special syntax exists for them.
Value =/ ws Value =/ ws ";" linecomment (CR / LF) Value
";" *(%x00-09 / %x0B-0C / %x0E-10FFFF) newline linecomment = *<any unicode scalar value except CR or LF>
Value
When written this way, everything between the `;` and the newline is When written this way, everything between the `;` and the end of the line
included in the string annotating the `Value`. is included in the string annotating the `Value`.
**Equivalence.** Annotations appear within syntax denoting a `Value`; **Equivalence.** Annotations appear within syntax denoting a `Value`;
however, the annotations are not part of the denoted value. They are however, the annotations are not part of the denoted value. They are