Update spec text for numbers/symbols

This commit is contained in:
Tony Garnock-Jones 2022-11-06 22:27:01 +01:00
parent 351feba8d2
commit 2d3df32749
2 changed files with 112 additions and 113 deletions

View File

@ -40,10 +40,9 @@ Standalone documents may have trailing whitespace.
Any `Value` may be preceded by whitespace.
Value = ws (Record / Collection / Atom / Embedded / Machine)
Value = ws (Record / Collection / Atom / Embedded)
Collection = Sequence / Dictionary / Set
Atom = Boolean / Float / Double / SignedInteger /
String / ByteString / Symbol
Atom = Boolean / String / ByteString / QuotedSymbol / SymbolOrNumber
Each `Record` is an angle-bracket enclosed grouping of its
label-`Value` followed by its field-`Value`s.
@ -73,55 +72,6 @@ false, respectively.
Boolean = %s"#t" / %s"#f"
Numeric data follow the
[JSON grammar](https://tools.ietf.org/html/rfc8259#section-6), with
the addition of a trailing “f” distinguishing `Float` from `Double`
values. `Float`s and `Double`s always have either a fractional part or
an exponent part, where `SignedInteger`s never have
either.[^reading-and-writing-floats-accurately]
[^arbitrary-precision-signedinteger]
Float = flt %i"f"
Double = flt
SignedInteger = int
digit1-9 = %x31-39
nat = %x30 / ( digit1-9 *DIGIT )
int = ["-"] nat
frac = "." 1*DIGIT
exp = %i"e" ["-"/"+"] 1*DIGIT
flt = int (frac exp / frac / exp)
[^reading-and-writing-floats-accurately]: **Implementation note.**
Your language's standard library likely has a good routine for
converting between decimal notation and IEEE 754 floating-point.
However, if not, or if you are interested in the challenges of
accurately reading and writing floating point numbers, see the
excellent matched pair of 1990 papers by Clinger and Steele &
White, and a recent follow-up by Jaffer:
Clinger, William D. How to Read Floating Point Numbers
Accurately. In Proc. PLDI. White Plains, New York, 1990.
<https://doi.org/10.1145/93542.93557>.
Steele, Guy L., Jr., and Jon L. White. How to Print
Floating-Point Numbers Accurately. In Proc. PLDI. White Plains,
New York, 1990. <https://doi.org/10.1145/93542.93559>.
Jaffer, Aubrey. Easy Accurate Reading and Writing of
Floating-Point Numbers. ArXiv:1310.8121 [Cs], 27 October 2013.
<http://arxiv.org/abs/1310.8121>.
[^arbitrary-precision-signedinteger]: **Implementation note.** Be
aware when implementing reading and writing of `SignedInteger`s
that the data model *requires* arbitrary-precision integers. Your
implementation may (but, ideally, should not) truncate precision
when reading or writing a `SignedInteger`; however, if it does so,
it should (a) signal its client that truncation has occurred, and
(b) make it clear to the client that comparing such truncated
values for equality or ordering will not yield results that match
the expected semantics of the data model.
`String`s are,
[as in JSON](https://tools.ietf.org/html/rfc8259#section-7), possibly
escaped text surrounded by double quotes. The escaping rules are the
@ -177,62 +127,109 @@ Base64 characters are allowed.
ByteString =/ "#[" *(ws / base64char) ws "]"
base64char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "-" / "_" / "="
A `Symbol` may be written in a “bare” form[^cf-sexp-token] so long as
it conforms to certain restrictions on the characters appearing in the
symbol. Alternatively, it may be written in a quoted form. The quoted
form is much the same as the syntax for `String`s, including embedded
escape syntax, except using a bar or pipe character (`|`) instead of a
double quote mark.
A `Symbol` may be written in either of two forms.
Symbol = symstart *symcont / "|" *symchar "|"
symstart = ALPHA / sympunct / symustart
symcont = ALPHA / sympunct / symustart / symucont / DIGIT / "-"
sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
"?" / "_" / "=" / "+" / "/" / "."
The first is a quoted form, much the same as the syntax for `String`s,
including embedded escape syntax, except using a bar or pipe character
(`|`) instead of a double quote mark.
QuotedSymbol = "|" *symchar "|"
symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
symustart = <any code point greater than 127 whose Unicode
category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me,
Pc, Po, Sc, Sm, Sk, So, or Co>
symucont = <any code point greater than 127 whose Unicode
category is Nd, Nl, No, or Pd>
Alternatively, a `Symbol` may be written in a “bare” form[^cf-sexp-token].
The grammar for numeric data is a subset of the grammar for bare `Symbol`s,
so if a `SymbolOrNumber` also matches the grammar for `Float`, `Double` or
`SignedInteger`, then it must be interpreted as one of those, and otherwise
it must be interpreted as a bare `Symbol`.
SymbolOrNumber = *baresymchar
baresymchar = ALPHA / DIGIT / sympunct / symuchar
sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
"?" / "_" / "=" / "+" / "-" / "/" / "."
symuchar = <any code point greater than 127 whose Unicode
category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
Nl, No, Pc, Pd, Po, Sc, Sm, Sk, So, or Co>
[^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt]
definition of “token representation”, and with the
[R6RS definition of identifiers](http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2.4).
An `Embedded` is written as a `Value` chosen to represent the denoted
object, prefixed with `#!`.
Numeric data follow the [JSON
grammar](https://tools.ietf.org/html/rfc8259#section-6) except that leading
zeros are permitted and an optional leading `+` sign is allowed. The
addition of a trailing “f” distinguishes a `Float` from a `Double` value.
`Float`s and `Double`s always have either a fractional part or an exponent
part, where `SignedInteger`s never have
either.[^reading-and-writing-floats-accurately]
[^arbitrary-precision-signedinteger]
Float = flt %i"f"
Double = flt
SignedInteger = int
digit1-9 = %x31-39
nat = 1*DIGIT
int = ["-"/"+"] nat
frac = "." 1*DIGIT
exp = %i"e" ["-"/"+"] 1*DIGIT
flt = int (frac exp / frac / exp)
[^reading-and-writing-floats-accurately]: **Implementation note.**
Your language's standard library likely has a good routine for
converting between decimal notation and IEEE 754 floating-point.
However, if not, or if you are interested in the challenges of
accurately reading and writing floating point numbers, see the
excellent matched pair of 1990 papers by Clinger and Steele &
White, and a recent follow-up by Jaffer:
Clinger, William D. How to Read Floating Point Numbers
Accurately. In Proc. PLDI. White Plains, New York, 1990.
<https://doi.org/10.1145/93542.93557>.
Steele, Guy L., Jr., and Jon L. White. How to Print
Floating-Point Numbers Accurately. In Proc. PLDI. White Plains,
New York, 1990. <https://doi.org/10.1145/93542.93559>.
Jaffer, Aubrey. Easy Accurate Reading and Writing of
Floating-Point Numbers. ArXiv:1310.8121 [Cs], 27 October 2013.
<http://arxiv.org/abs/1310.8121>.
[^arbitrary-precision-signedinteger]: **Implementation note.** Be
aware when implementing reading and writing of `SignedInteger`s
that the data model *requires* arbitrary-precision integers. Your
implementation may (but, ideally, should not) truncate precision
when reading or writing a `SignedInteger`; however, if it does so,
it should (a) signal its client that truncation has occurred, and
(b) make it clear to the client that comparing such truncated
values for equality or ordering will not yield results that match
the expected semantics of the data model.
Some valid IEEE 754 `Float`s and `Double`s are not covered by the grammar
above, namely, the several million NaNs and the two infinities. These are
represented as raw hexadecimal strings similar to hexadecimal
`ByteString`s. Implementations are free to use hexadecimal floating-point
syntax whereever convenient, even for values representable using the
grammar above.[^rationale-no-general-machine-syntax]
Float =/ "#xf" %x22 8HEXDIG %x22
Double =/ "#xd" %x22 16HEXDIG %x22
[^rationale-no-general-machine-syntax]: **Rationale.** Previous versions
of this specification included an escape to the [machine-oriented
binary syntax](preserves-binary.html) by prefixing a `ByteString`
containing the binary representation of the `Value` with `#=`. The only
true need for this feature was to represent otherwise-unrepresentable
floating-point values. Instead, this specification allows such
floating-point values to be written directly. Removing the `#=` syntax
simplifies implementations (there is no longer any need to support the
machine-oriented syntax) and avoids complications around treatment of
annotations potentially contained within machine-encoded values.
Finally, an `Embedded` is written as a `Value` chosen to represent the
denoted object, prefixed with `#!`.
Embedded = "#!" Value
Finally, any `Value` may be represented by escaping from the textual
syntax to the [machine-oriented binary syntax](preserves-binary.html)
by prefixing a `ByteString` containing the binary representation of the
`Value` with `#=`.[^rationale-switch-to-binary]
[^no-literal-binary-in-text] [^machine-value-annotations]
Machine = "#=" ws ByteString
[^rationale-switch-to-binary]: **Rationale.** The textual syntax
cannot express every `Value`: specifically, it cannot express the
several million floating-point NaNs, or the two floating-point
Infinities. Since the machine-oriented binary format for `Value`s
expresses each `Value` with precision, embedding binary `Value`s
solves the problem.
[^no-literal-binary-in-text]: Every text is ultimately physically
stored as bytes; therefore, it might seem possible to escape to the
raw form of binary encoding from within a piece of textual syntax.
However, while bytes must be involved in any *representation* of
text, the text *itself* is logically a sequence of *code points* and
is not *intrinsically* a binary structure at all. It would be
incoherent to expect to be able to access the representation of the
text from within the text itself.
[^machine-value-annotations]: Any text-syntax annotations preceding
the `#` are prepended to any binary-syntax annotations yielded by
decoding the `ByteString`.
## Annotations
When written down, a `Value` may have an associated sequence of

View File

@ -220,21 +220,23 @@ The total ordering specified [above](#total-order) means that the following stat
<!-- TODO: Give some examples of large and small Preserves, perhaps -->
<!-- translated from various JSON blobs floating around the internet. -->
| Value | Encoded byte sequence |
|-----------------------------|---------------------------------------------------------------------------------|
| `<capture <discard>>` | B4 B3 07 'c' 'a' 'p' 't' 'u' 'r' 'e' B4 B3 07 'd' 'i' 's' 'c' 'a' 'r' 'd' 84 84 |
| `[1 2 3 4]` | B5 91 92 93 94 84 |
| `[-2 -1 0 1]` | B5 9E 9F 90 91 84 |
| `"hello"` (format B) | B1 05 'h' 'e' 'l' 'l' 'o' |
| `["a" b #"c" [] #{} #t #f]` | B5 B1 01 'a' B3 01 'b' B2 01 'c' B5 84 B6 84 81 80 84 |
| `-257` | A1 FE FF |
| `-1` | 9F |
| `0` | 90 |
| `1` | 91 |
| `255` | A1 00 FF |
| `1.0f` | 82 3F 80 00 00 |
| `1.0` | 83 3F F0 00 00 00 00 00 00 |
| `-1.202e300` | 83 FE 3C B7 B7 59 BF 04 26 |
| Value | Encoded byte sequence |
|-----------------------------------------------------|---------------------------------------------------------------------------------|
| `<capture <discard>>` | B4 B3 07 'c' 'a' 'p' 't' 'u' 'r' 'e' B4 B3 07 'd' 'i' 's' 'c' 'a' 'r' 'd' 84 84 |
| `[1 2 3 4]` | B5 91 92 93 94 84 |
| `[-2 -1 0 1]` | B5 9E 9F 90 91 84 |
| `"hello"` (format B) | B1 05 'h' 'e' 'l' 'l' 'o' |
| `["a" b #"c" [] #{} #t #f]` | B5 B1 01 'a' B3 01 'b' B2 01 'c' B5 84 B6 84 81 80 84 |
| `-257` | A1 FE FF |
| `-1` | 9F |
| `0` | 90 |
| `1` | 91 |
| `255` | A1 00 FF |
| `1.0f` | 82 3F 80 00 00 |
| `1.0` | 83 3F F0 00 00 00 00 00 00 |
| `-1.202e300` | 83 FE 3C B7 B7 59 BF 04 26 |
| `#xf"7f800000"`, positive `Float` infinity | 82 7F 80 00 00 |
| `#xd"fff0000000000000"`, negative `Double` infinity | 83 FF F0 00 00 00 00 00 00 |
The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`