Update spec text for numbers/symbols
This commit is contained in:
parent
351feba8d2
commit
2d3df32749
|
@ -40,10 +40,9 @@ Standalone documents may have trailing whitespace.
|
|||
|
||||
Any `Value` may be preceded by whitespace.
|
||||
|
||||
Value = ws (Record / Collection / Atom / Embedded / Machine)
|
||||
Value = ws (Record / Collection / Atom / Embedded)
|
||||
Collection = Sequence / Dictionary / Set
|
||||
Atom = Boolean / Float / Double / SignedInteger /
|
||||
String / ByteString / Symbol
|
||||
Atom = Boolean / String / ByteString / QuotedSymbol / SymbolOrNumber
|
||||
|
||||
Each `Record` is an angle-bracket enclosed grouping of its
|
||||
label-`Value` followed by its field-`Value`s.
|
||||
|
@ -73,55 +72,6 @@ false, respectively.
|
|||
|
||||
Boolean = %s"#t" / %s"#f"
|
||||
|
||||
Numeric data follow the
|
||||
[JSON grammar](https://tools.ietf.org/html/rfc8259#section-6), with
|
||||
the addition of a trailing “f” distinguishing `Float` from `Double`
|
||||
values. `Float`s and `Double`s always have either a fractional part or
|
||||
an exponent part, where `SignedInteger`s never have
|
||||
either.[^reading-and-writing-floats-accurately]
|
||||
[^arbitrary-precision-signedinteger]
|
||||
|
||||
Float = flt %i"f"
|
||||
Double = flt
|
||||
SignedInteger = int
|
||||
|
||||
digit1-9 = %x31-39
|
||||
nat = %x30 / ( digit1-9 *DIGIT )
|
||||
int = ["-"] nat
|
||||
frac = "." 1*DIGIT
|
||||
exp = %i"e" ["-"/"+"] 1*DIGIT
|
||||
flt = int (frac exp / frac / exp)
|
||||
|
||||
[^reading-and-writing-floats-accurately]: **Implementation note.**
|
||||
Your language's standard library likely has a good routine for
|
||||
converting between decimal notation and IEEE 754 floating-point.
|
||||
However, if not, or if you are interested in the challenges of
|
||||
accurately reading and writing floating point numbers, see the
|
||||
excellent matched pair of 1990 papers by Clinger and Steele &
|
||||
White, and a recent follow-up by Jaffer:
|
||||
|
||||
Clinger, William D. ‘How to Read Floating Point Numbers
|
||||
Accurately’. In Proc. PLDI. White Plains, New York, 1990.
|
||||
<https://doi.org/10.1145/93542.93557>.
|
||||
|
||||
Steele, Guy L., Jr., and Jon L. White. ‘How to Print
|
||||
Floating-Point Numbers Accurately’. In Proc. PLDI. White Plains,
|
||||
New York, 1990. <https://doi.org/10.1145/93542.93559>.
|
||||
|
||||
Jaffer, Aubrey. ‘Easy Accurate Reading and Writing of
|
||||
Floating-Point Numbers’. ArXiv:1310.8121 [Cs], 27 October 2013.
|
||||
<http://arxiv.org/abs/1310.8121>.
|
||||
|
||||
[^arbitrary-precision-signedinteger]: **Implementation note.** Be
|
||||
aware when implementing reading and writing of `SignedInteger`s
|
||||
that the data model *requires* arbitrary-precision integers. Your
|
||||
implementation may (but, ideally, should not) truncate precision
|
||||
when reading or writing a `SignedInteger`; however, if it does so,
|
||||
it should (a) signal its client that truncation has occurred, and
|
||||
(b) make it clear to the client that comparing such truncated
|
||||
values for equality or ordering will not yield results that match
|
||||
the expected semantics of the data model.
|
||||
|
||||
`String`s are,
|
||||
[as in JSON](https://tools.ietf.org/html/rfc8259#section-7), possibly
|
||||
escaped text surrounded by double quotes. The escaping rules are the
|
||||
|
@ -177,62 +127,109 @@ Base64 characters are allowed.
|
|||
ByteString =/ "#[" *(ws / base64char) ws "]"
|
||||
base64char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "-" / "_" / "="
|
||||
|
||||
A `Symbol` may be written in a “bare” form[^cf-sexp-token] so long as
|
||||
it conforms to certain restrictions on the characters appearing in the
|
||||
symbol. Alternatively, it may be written in a quoted form. The quoted
|
||||
form is much the same as the syntax for `String`s, including embedded
|
||||
escape syntax, except using a bar or pipe character (`|`) instead of a
|
||||
double quote mark.
|
||||
A `Symbol` may be written in either of two forms.
|
||||
|
||||
Symbol = symstart *symcont / "|" *symchar "|"
|
||||
symstart = ALPHA / sympunct / symustart
|
||||
symcont = ALPHA / sympunct / symustart / symucont / DIGIT / "-"
|
||||
sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
|
||||
"?" / "_" / "=" / "+" / "/" / "."
|
||||
The first is a quoted form, much the same as the syntax for `String`s,
|
||||
including embedded escape syntax, except using a bar or pipe character
|
||||
(`|`) instead of a double quote mark.
|
||||
|
||||
QuotedSymbol = "|" *symchar "|"
|
||||
symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
|
||||
symustart = <any code point greater than 127 whose Unicode
|
||||
category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me,
|
||||
Pc, Po, Sc, Sm, Sk, So, or Co>
|
||||
symucont = <any code point greater than 127 whose Unicode
|
||||
category is Nd, Nl, No, or Pd>
|
||||
|
||||
Alternatively, a `Symbol` may be written in a “bare” form[^cf-sexp-token].
|
||||
The grammar for numeric data is a subset of the grammar for bare `Symbol`s,
|
||||
so if a `SymbolOrNumber` also matches the grammar for `Float`, `Double` or
|
||||
`SignedInteger`, then it must be interpreted as one of those, and otherwise
|
||||
it must be interpreted as a bare `Symbol`.
|
||||
|
||||
SymbolOrNumber = *baresymchar
|
||||
baresymchar = ALPHA / DIGIT / sympunct / symuchar
|
||||
sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
|
||||
"?" / "_" / "=" / "+" / "-" / "/" / "."
|
||||
symuchar = <any code point greater than 127 whose Unicode
|
||||
category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
|
||||
Nl, No, Pc, Pd, Po, Sc, Sm, Sk, So, or Co>
|
||||
|
||||
[^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt]
|
||||
definition of “token representation”, and with the
|
||||
[R6RS definition of identifiers](http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2.4).
|
||||
|
||||
An `Embedded` is written as a `Value` chosen to represent the denoted
|
||||
object, prefixed with `#!`.
|
||||
Numeric data follow the [JSON
|
||||
grammar](https://tools.ietf.org/html/rfc8259#section-6) except that leading
|
||||
zeros are permitted and an optional leading `+` sign is allowed. The
|
||||
addition of a trailing “f” distinguishes a `Float` from a `Double` value.
|
||||
`Float`s and `Double`s always have either a fractional part or an exponent
|
||||
part, where `SignedInteger`s never have
|
||||
either.[^reading-and-writing-floats-accurately]
|
||||
[^arbitrary-precision-signedinteger]
|
||||
|
||||
Float = flt %i"f"
|
||||
Double = flt
|
||||
SignedInteger = int
|
||||
|
||||
digit1-9 = %x31-39
|
||||
nat = 1*DIGIT
|
||||
int = ["-"/"+"] nat
|
||||
frac = "." 1*DIGIT
|
||||
exp = %i"e" ["-"/"+"] 1*DIGIT
|
||||
flt = int (frac exp / frac / exp)
|
||||
|
||||
[^reading-and-writing-floats-accurately]: **Implementation note.**
|
||||
Your language's standard library likely has a good routine for
|
||||
converting between decimal notation and IEEE 754 floating-point.
|
||||
However, if not, or if you are interested in the challenges of
|
||||
accurately reading and writing floating point numbers, see the
|
||||
excellent matched pair of 1990 papers by Clinger and Steele &
|
||||
White, and a recent follow-up by Jaffer:
|
||||
|
||||
Clinger, William D. ‘How to Read Floating Point Numbers
|
||||
Accurately’. In Proc. PLDI. White Plains, New York, 1990.
|
||||
<https://doi.org/10.1145/93542.93557>.
|
||||
|
||||
Steele, Guy L., Jr., and Jon L. White. ‘How to Print
|
||||
Floating-Point Numbers Accurately’. In Proc. PLDI. White Plains,
|
||||
New York, 1990. <https://doi.org/10.1145/93542.93559>.
|
||||
|
||||
Jaffer, Aubrey. ‘Easy Accurate Reading and Writing of
|
||||
Floating-Point Numbers’. ArXiv:1310.8121 [Cs], 27 October 2013.
|
||||
<http://arxiv.org/abs/1310.8121>.
|
||||
|
||||
[^arbitrary-precision-signedinteger]: **Implementation note.** Be
|
||||
aware when implementing reading and writing of `SignedInteger`s
|
||||
that the data model *requires* arbitrary-precision integers. Your
|
||||
implementation may (but, ideally, should not) truncate precision
|
||||
when reading or writing a `SignedInteger`; however, if it does so,
|
||||
it should (a) signal its client that truncation has occurred, and
|
||||
(b) make it clear to the client that comparing such truncated
|
||||
values for equality or ordering will not yield results that match
|
||||
the expected semantics of the data model.
|
||||
|
||||
Some valid IEEE 754 `Float`s and `Double`s are not covered by the grammar
|
||||
above, namely, the several million NaNs and the two infinities. These are
|
||||
represented as raw hexadecimal strings similar to hexadecimal
|
||||
`ByteString`s. Implementations are free to use hexadecimal floating-point
|
||||
syntax whereever convenient, even for values representable using the
|
||||
grammar above.[^rationale-no-general-machine-syntax]
|
||||
|
||||
Float =/ "#xf" %x22 8HEXDIG %x22
|
||||
Double =/ "#xd" %x22 16HEXDIG %x22
|
||||
|
||||
[^rationale-no-general-machine-syntax]: **Rationale.** Previous versions
|
||||
of this specification included an escape to the [machine-oriented
|
||||
binary syntax](preserves-binary.html) by prefixing a `ByteString`
|
||||
containing the binary representation of the `Value` with `#=`. The only
|
||||
true need for this feature was to represent otherwise-unrepresentable
|
||||
floating-point values. Instead, this specification allows such
|
||||
floating-point values to be written directly. Removing the `#=` syntax
|
||||
simplifies implementations (there is no longer any need to support the
|
||||
machine-oriented syntax) and avoids complications around treatment of
|
||||
annotations potentially contained within machine-encoded values.
|
||||
|
||||
Finally, an `Embedded` is written as a `Value` chosen to represent the
|
||||
denoted object, prefixed with `#!`.
|
||||
|
||||
Embedded = "#!" Value
|
||||
|
||||
Finally, any `Value` may be represented by escaping from the textual
|
||||
syntax to the [machine-oriented binary syntax](preserves-binary.html)
|
||||
by prefixing a `ByteString` containing the binary representation of the
|
||||
`Value` with `#=`.[^rationale-switch-to-binary]
|
||||
[^no-literal-binary-in-text] [^machine-value-annotations]
|
||||
|
||||
Machine = "#=" ws ByteString
|
||||
|
||||
[^rationale-switch-to-binary]: **Rationale.** The textual syntax
|
||||
cannot express every `Value`: specifically, it cannot express the
|
||||
several million floating-point NaNs, or the two floating-point
|
||||
Infinities. Since the machine-oriented binary format for `Value`s
|
||||
expresses each `Value` with precision, embedding binary `Value`s
|
||||
solves the problem.
|
||||
|
||||
[^no-literal-binary-in-text]: Every text is ultimately physically
|
||||
stored as bytes; therefore, it might seem possible to escape to the
|
||||
raw form of binary encoding from within a piece of textual syntax.
|
||||
However, while bytes must be involved in any *representation* of
|
||||
text, the text *itself* is logically a sequence of *code points* and
|
||||
is not *intrinsically* a binary structure at all. It would be
|
||||
incoherent to expect to be able to access the representation of the
|
||||
text from within the text itself.
|
||||
|
||||
[^machine-value-annotations]: Any text-syntax annotations preceding
|
||||
the `#` are prepended to any binary-syntax annotations yielded by
|
||||
decoding the `ByteString`.
|
||||
|
||||
## Annotations
|
||||
|
||||
When written down, a `Value` may have an associated sequence of
|
||||
|
|
32
preserves.md
32
preserves.md
|
@ -220,21 +220,23 @@ The total ordering specified [above](#total-order) means that the following stat
|
|||
<!-- TODO: Give some examples of large and small Preserves, perhaps -->
|
||||
<!-- translated from various JSON blobs floating around the internet. -->
|
||||
|
||||
| Value | Encoded byte sequence |
|
||||
|-----------------------------|---------------------------------------------------------------------------------|
|
||||
| `<capture <discard>>` | B4 B3 07 'c' 'a' 'p' 't' 'u' 'r' 'e' B4 B3 07 'd' 'i' 's' 'c' 'a' 'r' 'd' 84 84 |
|
||||
| `[1 2 3 4]` | B5 91 92 93 94 84 |
|
||||
| `[-2 -1 0 1]` | B5 9E 9F 90 91 84 |
|
||||
| `"hello"` (format B) | B1 05 'h' 'e' 'l' 'l' 'o' |
|
||||
| `["a" b #"c" [] #{} #t #f]` | B5 B1 01 'a' B3 01 'b' B2 01 'c' B5 84 B6 84 81 80 84 |
|
||||
| `-257` | A1 FE FF |
|
||||
| `-1` | 9F |
|
||||
| `0` | 90 |
|
||||
| `1` | 91 |
|
||||
| `255` | A1 00 FF |
|
||||
| `1.0f` | 82 3F 80 00 00 |
|
||||
| `1.0` | 83 3F F0 00 00 00 00 00 00 |
|
||||
| `-1.202e300` | 83 FE 3C B7 B7 59 BF 04 26 |
|
||||
| Value | Encoded byte sequence |
|
||||
|-----------------------------------------------------|---------------------------------------------------------------------------------|
|
||||
| `<capture <discard>>` | B4 B3 07 'c' 'a' 'p' 't' 'u' 'r' 'e' B4 B3 07 'd' 'i' 's' 'c' 'a' 'r' 'd' 84 84 |
|
||||
| `[1 2 3 4]` | B5 91 92 93 94 84 |
|
||||
| `[-2 -1 0 1]` | B5 9E 9F 90 91 84 |
|
||||
| `"hello"` (format B) | B1 05 'h' 'e' 'l' 'l' 'o' |
|
||||
| `["a" b #"c" [] #{} #t #f]` | B5 B1 01 'a' B3 01 'b' B2 01 'c' B5 84 B6 84 81 80 84 |
|
||||
| `-257` | A1 FE FF |
|
||||
| `-1` | 9F |
|
||||
| `0` | 90 |
|
||||
| `1` | 91 |
|
||||
| `255` | A1 00 FF |
|
||||
| `1.0f` | 82 3F 80 00 00 |
|
||||
| `1.0` | 83 3F F0 00 00 00 00 00 00 |
|
||||
| `-1.202e300` | 83 FE 3C B7 B7 59 BF 04 26 |
|
||||
| `#xf"7f800000"`, positive `Float` infinity | 82 7F 80 00 00 |
|
||||
| `#xd"fff0000000000000"`, negative `Double` infinity | 83 FF F0 00 00 00 00 00 00 |
|
||||
|
||||
The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`
|
||||
|
||||
|
|
Loading…
Reference in New Issue