Update spec text for numbers/symbols
This commit is contained in:
parent
351feba8d2
commit
2d3df32749
|
@ -40,10 +40,9 @@ Standalone documents may have trailing whitespace.
|
||||||
|
|
||||||
Any `Value` may be preceded by whitespace.
|
Any `Value` may be preceded by whitespace.
|
||||||
|
|
||||||
Value = ws (Record / Collection / Atom / Embedded / Machine)
|
Value = ws (Record / Collection / Atom / Embedded)
|
||||||
Collection = Sequence / Dictionary / Set
|
Collection = Sequence / Dictionary / Set
|
||||||
Atom = Boolean / Float / Double / SignedInteger /
|
Atom = Boolean / String / ByteString / QuotedSymbol / SymbolOrNumber
|
||||||
String / ByteString / Symbol
|
|
||||||
|
|
||||||
Each `Record` is an angle-bracket enclosed grouping of its
|
Each `Record` is an angle-bracket enclosed grouping of its
|
||||||
label-`Value` followed by its field-`Value`s.
|
label-`Value` followed by its field-`Value`s.
|
||||||
|
@ -73,55 +72,6 @@ false, respectively.
|
||||||
|
|
||||||
Boolean = %s"#t" / %s"#f"
|
Boolean = %s"#t" / %s"#f"
|
||||||
|
|
||||||
Numeric data follow the
|
|
||||||
[JSON grammar](https://tools.ietf.org/html/rfc8259#section-6), with
|
|
||||||
the addition of a trailing “f” distinguishing `Float` from `Double`
|
|
||||||
values. `Float`s and `Double`s always have either a fractional part or
|
|
||||||
an exponent part, where `SignedInteger`s never have
|
|
||||||
either.[^reading-and-writing-floats-accurately]
|
|
||||||
[^arbitrary-precision-signedinteger]
|
|
||||||
|
|
||||||
Float = flt %i"f"
|
|
||||||
Double = flt
|
|
||||||
SignedInteger = int
|
|
||||||
|
|
||||||
digit1-9 = %x31-39
|
|
||||||
nat = %x30 / ( digit1-9 *DIGIT )
|
|
||||||
int = ["-"] nat
|
|
||||||
frac = "." 1*DIGIT
|
|
||||||
exp = %i"e" ["-"/"+"] 1*DIGIT
|
|
||||||
flt = int (frac exp / frac / exp)
|
|
||||||
|
|
||||||
[^reading-and-writing-floats-accurately]: **Implementation note.**
|
|
||||||
Your language's standard library likely has a good routine for
|
|
||||||
converting between decimal notation and IEEE 754 floating-point.
|
|
||||||
However, if not, or if you are interested in the challenges of
|
|
||||||
accurately reading and writing floating point numbers, see the
|
|
||||||
excellent matched pair of 1990 papers by Clinger and Steele &
|
|
||||||
White, and a recent follow-up by Jaffer:
|
|
||||||
|
|
||||||
Clinger, William D. ‘How to Read Floating Point Numbers
|
|
||||||
Accurately’. In Proc. PLDI. White Plains, New York, 1990.
|
|
||||||
<https://doi.org/10.1145/93542.93557>.
|
|
||||||
|
|
||||||
Steele, Guy L., Jr., and Jon L. White. ‘How to Print
|
|
||||||
Floating-Point Numbers Accurately’. In Proc. PLDI. White Plains,
|
|
||||||
New York, 1990. <https://doi.org/10.1145/93542.93559>.
|
|
||||||
|
|
||||||
Jaffer, Aubrey. ‘Easy Accurate Reading and Writing of
|
|
||||||
Floating-Point Numbers’. ArXiv:1310.8121 [Cs], 27 October 2013.
|
|
||||||
<http://arxiv.org/abs/1310.8121>.
|
|
||||||
|
|
||||||
[^arbitrary-precision-signedinteger]: **Implementation note.** Be
|
|
||||||
aware when implementing reading and writing of `SignedInteger`s
|
|
||||||
that the data model *requires* arbitrary-precision integers. Your
|
|
||||||
implementation may (but, ideally, should not) truncate precision
|
|
||||||
when reading or writing a `SignedInteger`; however, if it does so,
|
|
||||||
it should (a) signal its client that truncation has occurred, and
|
|
||||||
(b) make it clear to the client that comparing such truncated
|
|
||||||
values for equality or ordering will not yield results that match
|
|
||||||
the expected semantics of the data model.
|
|
||||||
|
|
||||||
`String`s are,
|
`String`s are,
|
||||||
[as in JSON](https://tools.ietf.org/html/rfc8259#section-7), possibly
|
[as in JSON](https://tools.ietf.org/html/rfc8259#section-7), possibly
|
||||||
escaped text surrounded by double quotes. The escaping rules are the
|
escaped text surrounded by double quotes. The escaping rules are the
|
||||||
|
@ -177,62 +127,109 @@ Base64 characters are allowed.
|
||||||
ByteString =/ "#[" *(ws / base64char) ws "]"
|
ByteString =/ "#[" *(ws / base64char) ws "]"
|
||||||
base64char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "-" / "_" / "="
|
base64char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "-" / "_" / "="
|
||||||
|
|
||||||
A `Symbol` may be written in a “bare” form[^cf-sexp-token] so long as
|
A `Symbol` may be written in either of two forms.
|
||||||
it conforms to certain restrictions on the characters appearing in the
|
|
||||||
symbol. Alternatively, it may be written in a quoted form. The quoted
|
|
||||||
form is much the same as the syntax for `String`s, including embedded
|
|
||||||
escape syntax, except using a bar or pipe character (`|`) instead of a
|
|
||||||
double quote mark.
|
|
||||||
|
|
||||||
Symbol = symstart *symcont / "|" *symchar "|"
|
The first is a quoted form, much the same as the syntax for `String`s,
|
||||||
symstart = ALPHA / sympunct / symustart
|
including embedded escape syntax, except using a bar or pipe character
|
||||||
symcont = ALPHA / sympunct / symustart / symucont / DIGIT / "-"
|
(`|`) instead of a double quote mark.
|
||||||
sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
|
|
||||||
"?" / "_" / "=" / "+" / "/" / "."
|
QuotedSymbol = "|" *symchar "|"
|
||||||
symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
|
symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
|
||||||
symustart = <any code point greater than 127 whose Unicode
|
|
||||||
category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me,
|
Alternatively, a `Symbol` may be written in a “bare” form[^cf-sexp-token].
|
||||||
Pc, Po, Sc, Sm, Sk, So, or Co>
|
The grammar for numeric data is a subset of the grammar for bare `Symbol`s,
|
||||||
symucont = <any code point greater than 127 whose Unicode
|
so if a `SymbolOrNumber` also matches the grammar for `Float`, `Double` or
|
||||||
category is Nd, Nl, No, or Pd>
|
`SignedInteger`, then it must be interpreted as one of those, and otherwise
|
||||||
|
it must be interpreted as a bare `Symbol`.
|
||||||
|
|
||||||
|
SymbolOrNumber = *baresymchar
|
||||||
|
baresymchar = ALPHA / DIGIT / sympunct / symuchar
|
||||||
|
sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
|
||||||
|
"?" / "_" / "=" / "+" / "-" / "/" / "."
|
||||||
|
symuchar = <any code point greater than 127 whose Unicode
|
||||||
|
category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
|
||||||
|
Nl, No, Pc, Pd, Po, Sc, Sm, Sk, So, or Co>
|
||||||
|
|
||||||
[^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt]
|
[^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt]
|
||||||
definition of “token representation”, and with the
|
definition of “token representation”, and with the
|
||||||
[R6RS definition of identifiers](http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2.4).
|
[R6RS definition of identifiers](http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2.4).
|
||||||
|
|
||||||
An `Embedded` is written as a `Value` chosen to represent the denoted
|
Numeric data follow the [JSON
|
||||||
object, prefixed with `#!`.
|
grammar](https://tools.ietf.org/html/rfc8259#section-6) except that leading
|
||||||
|
zeros are permitted and an optional leading `+` sign is allowed. The
|
||||||
|
addition of a trailing “f” distinguishes a `Float` from a `Double` value.
|
||||||
|
`Float`s and `Double`s always have either a fractional part or an exponent
|
||||||
|
part, where `SignedInteger`s never have
|
||||||
|
either.[^reading-and-writing-floats-accurately]
|
||||||
|
[^arbitrary-precision-signedinteger]
|
||||||
|
|
||||||
|
Float = flt %i"f"
|
||||||
|
Double = flt
|
||||||
|
SignedInteger = int
|
||||||
|
|
||||||
|
digit1-9 = %x31-39
|
||||||
|
nat = 1*DIGIT
|
||||||
|
int = ["-"/"+"] nat
|
||||||
|
frac = "." 1*DIGIT
|
||||||
|
exp = %i"e" ["-"/"+"] 1*DIGIT
|
||||||
|
flt = int (frac exp / frac / exp)
|
||||||
|
|
||||||
|
[^reading-and-writing-floats-accurately]: **Implementation note.**
|
||||||
|
Your language's standard library likely has a good routine for
|
||||||
|
converting between decimal notation and IEEE 754 floating-point.
|
||||||
|
However, if not, or if you are interested in the challenges of
|
||||||
|
accurately reading and writing floating point numbers, see the
|
||||||
|
excellent matched pair of 1990 papers by Clinger and Steele &
|
||||||
|
White, and a recent follow-up by Jaffer:
|
||||||
|
|
||||||
|
Clinger, William D. ‘How to Read Floating Point Numbers
|
||||||
|
Accurately’. In Proc. PLDI. White Plains, New York, 1990.
|
||||||
|
<https://doi.org/10.1145/93542.93557>.
|
||||||
|
|
||||||
|
Steele, Guy L., Jr., and Jon L. White. ‘How to Print
|
||||||
|
Floating-Point Numbers Accurately’. In Proc. PLDI. White Plains,
|
||||||
|
New York, 1990. <https://doi.org/10.1145/93542.93559>.
|
||||||
|
|
||||||
|
Jaffer, Aubrey. ‘Easy Accurate Reading and Writing of
|
||||||
|
Floating-Point Numbers’. ArXiv:1310.8121 [Cs], 27 October 2013.
|
||||||
|
<http://arxiv.org/abs/1310.8121>.
|
||||||
|
|
||||||
|
[^arbitrary-precision-signedinteger]: **Implementation note.** Be
|
||||||
|
aware when implementing reading and writing of `SignedInteger`s
|
||||||
|
that the data model *requires* arbitrary-precision integers. Your
|
||||||
|
implementation may (but, ideally, should not) truncate precision
|
||||||
|
when reading or writing a `SignedInteger`; however, if it does so,
|
||||||
|
it should (a) signal its client that truncation has occurred, and
|
||||||
|
(b) make it clear to the client that comparing such truncated
|
||||||
|
values for equality or ordering will not yield results that match
|
||||||
|
the expected semantics of the data model.
|
||||||
|
|
||||||
|
Some valid IEEE 754 `Float`s and `Double`s are not covered by the grammar
|
||||||
|
above, namely, the several million NaNs and the two infinities. These are
|
||||||
|
represented as raw hexadecimal strings similar to hexadecimal
|
||||||
|
`ByteString`s. Implementations are free to use hexadecimal floating-point
|
||||||
|
syntax whereever convenient, even for values representable using the
|
||||||
|
grammar above.[^rationale-no-general-machine-syntax]
|
||||||
|
|
||||||
|
Float =/ "#xf" %x22 8HEXDIG %x22
|
||||||
|
Double =/ "#xd" %x22 16HEXDIG %x22
|
||||||
|
|
||||||
|
[^rationale-no-general-machine-syntax]: **Rationale.** Previous versions
|
||||||
|
of this specification included an escape to the [machine-oriented
|
||||||
|
binary syntax](preserves-binary.html) by prefixing a `ByteString`
|
||||||
|
containing the binary representation of the `Value` with `#=`. The only
|
||||||
|
true need for this feature was to represent otherwise-unrepresentable
|
||||||
|
floating-point values. Instead, this specification allows such
|
||||||
|
floating-point values to be written directly. Removing the `#=` syntax
|
||||||
|
simplifies implementations (there is no longer any need to support the
|
||||||
|
machine-oriented syntax) and avoids complications around treatment of
|
||||||
|
annotations potentially contained within machine-encoded values.
|
||||||
|
|
||||||
|
Finally, an `Embedded` is written as a `Value` chosen to represent the
|
||||||
|
denoted object, prefixed with `#!`.
|
||||||
|
|
||||||
Embedded = "#!" Value
|
Embedded = "#!" Value
|
||||||
|
|
||||||
Finally, any `Value` may be represented by escaping from the textual
|
|
||||||
syntax to the [machine-oriented binary syntax](preserves-binary.html)
|
|
||||||
by prefixing a `ByteString` containing the binary representation of the
|
|
||||||
`Value` with `#=`.[^rationale-switch-to-binary]
|
|
||||||
[^no-literal-binary-in-text] [^machine-value-annotations]
|
|
||||||
|
|
||||||
Machine = "#=" ws ByteString
|
|
||||||
|
|
||||||
[^rationale-switch-to-binary]: **Rationale.** The textual syntax
|
|
||||||
cannot express every `Value`: specifically, it cannot express the
|
|
||||||
several million floating-point NaNs, or the two floating-point
|
|
||||||
Infinities. Since the machine-oriented binary format for `Value`s
|
|
||||||
expresses each `Value` with precision, embedding binary `Value`s
|
|
||||||
solves the problem.
|
|
||||||
|
|
||||||
[^no-literal-binary-in-text]: Every text is ultimately physically
|
|
||||||
stored as bytes; therefore, it might seem possible to escape to the
|
|
||||||
raw form of binary encoding from within a piece of textual syntax.
|
|
||||||
However, while bytes must be involved in any *representation* of
|
|
||||||
text, the text *itself* is logically a sequence of *code points* and
|
|
||||||
is not *intrinsically* a binary structure at all. It would be
|
|
||||||
incoherent to expect to be able to access the representation of the
|
|
||||||
text from within the text itself.
|
|
||||||
|
|
||||||
[^machine-value-annotations]: Any text-syntax annotations preceding
|
|
||||||
the `#` are prepended to any binary-syntax annotations yielded by
|
|
||||||
decoding the `ByteString`.
|
|
||||||
|
|
||||||
## Annotations
|
## Annotations
|
||||||
|
|
||||||
When written down, a `Value` may have an associated sequence of
|
When written down, a `Value` may have an associated sequence of
|
||||||
|
|
32
preserves.md
32
preserves.md
|
@ -220,21 +220,23 @@ The total ordering specified [above](#total-order) means that the following stat
|
||||||
<!-- TODO: Give some examples of large and small Preserves, perhaps -->
|
<!-- TODO: Give some examples of large and small Preserves, perhaps -->
|
||||||
<!-- translated from various JSON blobs floating around the internet. -->
|
<!-- translated from various JSON blobs floating around the internet. -->
|
||||||
|
|
||||||
| Value | Encoded byte sequence |
|
| Value | Encoded byte sequence |
|
||||||
|-----------------------------|---------------------------------------------------------------------------------|
|
|-----------------------------------------------------|---------------------------------------------------------------------------------|
|
||||||
| `<capture <discard>>` | B4 B3 07 'c' 'a' 'p' 't' 'u' 'r' 'e' B4 B3 07 'd' 'i' 's' 'c' 'a' 'r' 'd' 84 84 |
|
| `<capture <discard>>` | B4 B3 07 'c' 'a' 'p' 't' 'u' 'r' 'e' B4 B3 07 'd' 'i' 's' 'c' 'a' 'r' 'd' 84 84 |
|
||||||
| `[1 2 3 4]` | B5 91 92 93 94 84 |
|
| `[1 2 3 4]` | B5 91 92 93 94 84 |
|
||||||
| `[-2 -1 0 1]` | B5 9E 9F 90 91 84 |
|
| `[-2 -1 0 1]` | B5 9E 9F 90 91 84 |
|
||||||
| `"hello"` (format B) | B1 05 'h' 'e' 'l' 'l' 'o' |
|
| `"hello"` (format B) | B1 05 'h' 'e' 'l' 'l' 'o' |
|
||||||
| `["a" b #"c" [] #{} #t #f]` | B5 B1 01 'a' B3 01 'b' B2 01 'c' B5 84 B6 84 81 80 84 |
|
| `["a" b #"c" [] #{} #t #f]` | B5 B1 01 'a' B3 01 'b' B2 01 'c' B5 84 B6 84 81 80 84 |
|
||||||
| `-257` | A1 FE FF |
|
| `-257` | A1 FE FF |
|
||||||
| `-1` | 9F |
|
| `-1` | 9F |
|
||||||
| `0` | 90 |
|
| `0` | 90 |
|
||||||
| `1` | 91 |
|
| `1` | 91 |
|
||||||
| `255` | A1 00 FF |
|
| `255` | A1 00 FF |
|
||||||
| `1.0f` | 82 3F 80 00 00 |
|
| `1.0f` | 82 3F 80 00 00 |
|
||||||
| `1.0` | 83 3F F0 00 00 00 00 00 00 |
|
| `1.0` | 83 3F F0 00 00 00 00 00 00 |
|
||||||
| `-1.202e300` | 83 FE 3C B7 B7 59 BF 04 26 |
|
| `-1.202e300` | 83 FE 3C B7 B7 59 BF 04 26 |
|
||||||
|
| `#xf"7f800000"`, positive `Float` infinity | 82 7F 80 00 00 |
|
||||||
|
| `#xd"fff0000000000000"`, negative `Double` infinity | 83 FF F0 00 00 00 00 00 00 |
|
||||||
|
|
||||||
The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`
|
The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue