Update spec text for numbers/symbols

2022-11-06 22:27:01 +01:00 · 2022-11-06 22:27:01 +01:00 · 2d3df32749
parent 351feba8d2
commit 2d3df32749
2 changed files with 112 additions and 113 deletions
--- a/preserves-text.md
+++ b/preserves-text.md
@ -40,10 +40,9 @@ Standalone documents may have trailing whitespace.
 Any `Value` may be preceded by whitespace.
-             Value = ws (Record / Collection / Atom / Embedded / Machine)
+             Value = ws (Record / Collection / Atom / Embedded)
        Collection = Sequence / Dictionary / Set
-              Atom = Boolean / Float / Double / SignedInteger /
+              Atom = Boolean / String / ByteString / QuotedSymbol / SymbolOrNumber
                     String / ByteString / Symbol
 Each `Record` is an angle-bracket enclosed grouping of its
 label-`Value` followed by its field-`Value`s.
@ -73,55 +72,6 @@ false, respectively.
           Boolean = %s"#t" / %s"#f"
 Numeric data follow the
 [JSON grammar](https://tools.ietf.org/html/rfc8259#section-6), with
 the addition of a trailing “f” distinguishing `Float` from `Double`
 values. `Float`s and `Double`s always have either a fractional part or
 an exponent part, where `SignedInteger`s never have
 either.[^reading-and-writing-floats-accurately]
 [^arbitrary-precision-signedinteger]
             Float = flt %i"f"
            Double = flt
     SignedInteger = int
          digit1-9 = %x31-39
               nat = %x30 / ( digit1-9 *DIGIT )
               int = ["-"] nat
              frac = "." 1*DIGIT
               exp = %i"e" ["-"/"+"] 1*DIGIT
               flt = int (frac exp / frac / exp)
  [^reading-and-writing-floats-accurately]: **Implementation note.**
    Your language's standard library likely has a good routine for
    converting between decimal notation and IEEE 754 floating-point.
    However, if not, or if you are interested in the challenges of
    accurately reading and writing floating point numbers, see the
    excellent matched pair of 1990 papers by Clinger and Steele &
    White, and a recent follow-up by Jaffer:
    Clinger, William D. ‘How to Read Floating Point Numbers
    Accurately’. In Proc. PLDI. White Plains, New York, 1990.
    <https://doi.org/10.1145/93542.93557>.
    Steele, Guy L., Jr., and Jon L. White. ‘How to Print
    Floating-Point Numbers Accurately’. In Proc. PLDI. White Plains,
    New York, 1990. <https://doi.org/10.1145/93542.93559>.
    Jaffer, Aubrey. ‘Easy Accurate Reading and Writing of
    Floating-Point Numbers’. ArXiv:1310.8121 [Cs], 27 October 2013.
    <http://arxiv.org/abs/1310.8121>.
  [^arbitrary-precision-signedinteger]: **Implementation note.** Be
    aware when implementing reading and writing of `SignedInteger`s
    that the data model *requires* arbitrary-precision integers. Your
    implementation may (but, ideally, should not) truncate precision
    when reading or writing a `SignedInteger`; however, if it does so,
    it should (a) signal its client that truncation has occurred, and
    (b) make it clear to the client that comparing such truncated
    values for equality or ordering will not yield results that match
    the expected semantics of the data model.
 `String`s are,
 [as in JSON](https://tools.ietf.org/html/rfc8259#section-7), possibly
 escaped text surrounded by double quotes. The escaping rules are the
@ -177,62 +127,109 @@ Base64 characters are allowed.
       ByteString =/ "#[" *(ws / base64char) ws "]"
        base64char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "-" / "_" / "="
-A `Symbol` may be written in a “bare” form[^cf-sexp-token] so long as
+A `Symbol` may be written in either of two forms.
 it conforms to certain restrictions on the characters appearing in the
 symbol. Alternatively, it may be written in a quoted form. The quoted
 form is much the same as the syntax for `String`s, including embedded
 escape syntax, except using a bar or pipe character (`|`) instead of a
 double quote mark.
-            Symbol = symstart *symcont / "|" *symchar "|"
+The first is a quoted form, much the same as the syntax for `String`s,
-          symstart = ALPHA / sympunct / symustart
+including embedded escape syntax, except using a bar or pipe character
-           symcont = ALPHA / sympunct / symustart / symucont / DIGIT / "-"
+(`|`) instead of a double quote mark.
-          sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
+
-                     "?" / "_" / "=" / "+" / "/" / "."
+      QuotedSymbol = "|" *symchar "|"
           symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
-         symustart = <any code point greater than 127 whose Unicode
+
-                      category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me,
+Alternatively, a `Symbol` may be written in a “bare” form[^cf-sexp-token].
-                      Pc, Po, Sc, Sm, Sk, So, or Co>
+The grammar for numeric data is a subset of the grammar for bare `Symbol`s,
-          symucont = <any code point greater than 127 whose Unicode
+so if a `SymbolOrNumber` also matches the grammar for `Float`, `Double` or
-                      category is Nd, Nl, No, or Pd>
+`SignedInteger`, then it must be interpreted as one of those, and otherwise
 it must be interpreted as a bare `Symbol`.
    SymbolOrNumber = *baresymchar
       baresymchar = ALPHA / DIGIT / sympunct / symuchar
          sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
                     "?" / "_" / "=" / "+" / "-" / "/" / "."
          symuchar = <any code point greater than 127 whose Unicode
                      category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
                      Nl, No, Pc, Pd, Po, Sc, Sm, Sk, So, or Co>
  [^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt]
    definition of “token representation”, and with the
    [R6RS definition of identifiers](http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2.4).
-An `Embedded` is written as a `Value` chosen to represent the denoted
+Numeric data follow the [JSON
-object, prefixed with `#!`.
+grammar](https://tools.ietf.org/html/rfc8259#section-6) except that leading
 zeros are permitted and an optional leading `+` sign is allowed. The
 addition of a trailing “f” distinguishes a `Float` from a `Double` value.
 `Float`s and `Double`s always have either a fractional part or an exponent
 part, where `SignedInteger`s never have
 either.[^reading-and-writing-floats-accurately]
 [^arbitrary-precision-signedinteger]
             Float = flt %i"f"
            Double = flt
     SignedInteger = int
          digit1-9 = %x31-39
               nat = 1*DIGIT
               int = ["-"/"+"] nat
              frac = "." 1*DIGIT
               exp = %i"e" ["-"/"+"] 1*DIGIT
               flt = int (frac exp / frac / exp)
  [^reading-and-writing-floats-accurately]: **Implementation note.**
    Your language's standard library likely has a good routine for
    converting between decimal notation and IEEE 754 floating-point.
    However, if not, or if you are interested in the challenges of
    accurately reading and writing floating point numbers, see the
    excellent matched pair of 1990 papers by Clinger and Steele &
    White, and a recent follow-up by Jaffer:
    Clinger, William D. ‘How to Read Floating Point Numbers
    Accurately’. In Proc. PLDI. White Plains, New York, 1990.
    <https://doi.org/10.1145/93542.93557>.
    Steele, Guy L., Jr., and Jon L. White. ‘How to Print
    Floating-Point Numbers Accurately’. In Proc. PLDI. White Plains,
    New York, 1990. <https://doi.org/10.1145/93542.93559>.
    Jaffer, Aubrey. ‘Easy Accurate Reading and Writing of
    Floating-Point Numbers’. ArXiv:1310.8121 [Cs], 27 October 2013.
    <http://arxiv.org/abs/1310.8121>.
  [^arbitrary-precision-signedinteger]: **Implementation note.** Be
    aware when implementing reading and writing of `SignedInteger`s
    that the data model *requires* arbitrary-precision integers. Your
    implementation may (but, ideally, should not) truncate precision
    when reading or writing a `SignedInteger`; however, if it does so,
    it should (a) signal its client that truncation has occurred, and
    (b) make it clear to the client that comparing such truncated
    values for equality or ordering will not yield results that match
    the expected semantics of the data model.
 Some valid IEEE 754 `Float`s and `Double`s are not covered by the grammar
 above, namely, the several million NaNs and the two infinities. These are
 represented as raw hexadecimal strings similar to hexadecimal
 `ByteString`s. Implementations are free to use hexadecimal floating-point
 syntax whereever convenient, even for values representable using the
 grammar above.[^rationale-no-general-machine-syntax]
            Float =/ "#xf" %x22 8HEXDIG %x22
           Double =/ "#xd" %x22 16HEXDIG %x22
  [^rationale-no-general-machine-syntax]: **Rationale.** Previous versions
    of this specification included an escape to the [machine-oriented
    binary syntax](preserves-binary.html) by prefixing a `ByteString`
    containing the binary representation of the `Value` with `#=`. The only
    true need for this feature was to represent otherwise-unrepresentable
    floating-point values. Instead, this specification allows such
    floating-point values to be written directly. Removing the `#=` syntax
    simplifies implementations (there is no longer any need to support the
    machine-oriented syntax) and avoids complications around treatment of
    annotations potentially contained within machine-encoded values.
 Finally, an `Embedded` is written as a `Value` chosen to represent the
 denoted object, prefixed with `#!`.
           Embedded = "#!" Value
 Finally, any `Value` may be represented by escaping from the textual
 syntax to the [machine-oriented binary syntax](preserves-binary.html)
 by prefixing a `ByteString` containing the binary representation of the
 `Value` with `#=`.[^rationale-switch-to-binary]
 [^no-literal-binary-in-text] [^machine-value-annotations]
           Machine = "#=" ws ByteString
  [^rationale-switch-to-binary]: **Rationale.** The textual syntax
    cannot express every `Value`: specifically, it cannot express the
    several million floating-point NaNs, or the two floating-point
    Infinities. Since the machine-oriented binary format for `Value`s
    expresses each `Value` with precision, embedding binary `Value`s
    solves the problem.
  [^no-literal-binary-in-text]: Every text is ultimately physically
    stored as bytes; therefore, it might seem possible to escape to the
    raw form of binary encoding from within a piece of textual syntax.
    However, while bytes must be involved in any *representation* of
    text, the text *itself* is logically a sequence of *code points* and
    is not *intrinsically* a binary structure at all. It would be
    incoherent to expect to be able to access the representation of the
    text from within the text itself.
  [^machine-value-annotations]: Any text-syntax annotations preceding
    the `#` are prepended to any binary-syntax annotations yielded by
    decoding the `ByteString`.
 ## Annotations
 When written down, a `Value` may have an associated sequence of
--- a/preserves.md
+++ b/preserves.md
@ -220,21 +220,23 @@ The total ordering specified [above](#total-order) means that the following stat
 <!-- TODO: Give some examples of large and small Preserves, perhaps -->
 <!-- translated from various JSON blobs floating around the internet. -->
-| Value                       | Encoded byte sequence                                                           |
+| Value                                               | Encoded byte sequence                                                           |
-|-----------------------------|---------------------------------------------------------------------------------|
+|-----------------------------------------------------|---------------------------------------------------------------------------------|
-| `<capture <discard>>`       | B4 B3 07 'c' 'a' 'p' 't' 'u' 'r' 'e' B4 B3 07 'd' 'i' 's' 'c' 'a' 'r' 'd' 84 84 |
+| `<capture <discard>>`                               | B4 B3 07 'c' 'a' 'p' 't' 'u' 'r' 'e' B4 B3 07 'd' 'i' 's' 'c' 'a' 'r' 'd' 84 84 |
-| `[1 2 3 4]`                 | B5 91 92 93 94 84                                                               |
+| `[1 2 3 4]`                                         | B5 91 92 93 94 84                                                               |
-| `[-2 -1 0 1]`               | B5 9E 9F 90 91 84                                                               |
+| `[-2 -1 0 1]`                                       | B5 9E 9F 90 91 84                                                               |
-| `"hello"` (format B)        | B1 05 'h' 'e' 'l' 'l' 'o'                                                       |
+| `"hello"` (format B)                                | B1 05 'h' 'e' 'l' 'l' 'o'                                                       |
-| `["a" b #"c" [] #{} #t #f]` | B5 B1 01 'a' B3 01 'b' B2 01 'c' B5 84 B6 84 81 80 84                           |
+| `["a" b #"c" [] #{} #t #f]`                         | B5 B1 01 'a' B3 01 'b' B2 01 'c' B5 84 B6 84 81 80 84                           |
-| `-257`                      | A1 FE FF                                                                        |
+| `-257`                                              | A1 FE FF                                                                        |
-| `-1`                        | 9F                                                                              |
+| `-1`                                                | 9F                                                                              |
-| `0`                         | 90                                                                              |
+| `0`                                                 | 90                                                                              |
-| `1`                         | 91                                                                              |
+| `1`                                                 | 91                                                                              |
-| `255`                       | A1 00 FF                                                                        |
+| `255`                                               | A1 00 FF                                                                        |
-| `1.0f`                      | 82 3F 80 00 00                                                                  |
+| `1.0f`                                              | 82 3F 80 00 00                                                                  |
-| `1.0`                       | 83 3F F0 00 00 00 00 00 00                                                      |
+| `1.0`                                               | 83 3F F0 00 00 00 00 00 00                                                      |
-| `-1.202e300`                | 83 FE 3C B7 B7 59 BF 04 26                                                      |
+| `-1.202e300`                                        | 83 FE 3C B7 B7 59 BF 04 26                                                      |
 | `#xf"7f800000"`, positive `Float` infinity          | 82 7F 80 00 00                                                                  |
 | `#xd"fff0000000000000"`, negative `Double` infinity | 83 FF F0 00 00 00 00 00 00                                                      |
 The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`