From 2d3df32749da7c601fb7977c9f6214863864ef10 Mon Sep 17 00:00:00 2001
From: Tony Garnock-Jones <tonyg@leastfixedpoint.com>
Date: Sun, 6 Nov 2022 22:27:01 +0100
Subject: [PATCH] Update spec text for numbers/symbols

---
 preserves-text.md | 193 +++++++++++++++++++++++-----------------------
 preserves.md      |  32 ++++----
 2 files changed, 112 insertions(+), 113 deletions(-)

diff --git a/preserves-text.md b/preserves-text.md
index e6a77c0..1fe6fc7 100644
--- a/preserves-text.md
+++ b/preserves-text.md
@@ -40,10 +40,9 @@ Standalone documents may have trailing whitespace.
 
 Any `Value` may be preceded by whitespace.
 
-             Value = ws (Record / Collection / Atom / Embedded / Machine)
+             Value = ws (Record / Collection / Atom / Embedded)
         Collection = Sequence / Dictionary / Set
-              Atom = Boolean / Float / Double / SignedInteger /
-                     String / ByteString / Symbol
+              Atom = Boolean / String / ByteString / QuotedSymbol / SymbolOrNumber
 
 Each `Record` is an angle-bracket enclosed grouping of its
 label-`Value` followed by its field-`Value`s.
@@ -73,55 +72,6 @@ false, respectively.
 
            Boolean = %s"#t" / %s"#f"
 
-Numeric data follow the
-[JSON grammar](https://tools.ietf.org/html/rfc8259#section-6), with
-the addition of a trailing “f” distinguishing `Float` from `Double`
-values. `Float`s and `Double`s always have either a fractional part or
-an exponent part, where `SignedInteger`s never have
-either.[^reading-and-writing-floats-accurately]
-[^arbitrary-precision-signedinteger]
-
-             Float = flt %i"f"
-            Double = flt
-     SignedInteger = int
-
-          digit1-9 = %x31-39
-               nat = %x30 / ( digit1-9 *DIGIT )
-               int = ["-"] nat
-              frac = "." 1*DIGIT
-               exp = %i"e" ["-"/"+"] 1*DIGIT
-               flt = int (frac exp / frac / exp)
-
-  [^reading-and-writing-floats-accurately]: **Implementation note.**
-    Your language's standard library likely has a good routine for
-    converting between decimal notation and IEEE 754 floating-point.
-    However, if not, or if you are interested in the challenges of
-    accurately reading and writing floating point numbers, see the
-    excellent matched pair of 1990 papers by Clinger and Steele &
-    White, and a recent follow-up by Jaffer:
-
-    Clinger, William D. ‘How to Read Floating Point Numbers
-    Accurately’. In Proc. PLDI. White Plains, New York, 1990.
-    <https://doi.org/10.1145/93542.93557>.
-
-    Steele, Guy L., Jr., and Jon L. White. ‘How to Print
-    Floating-Point Numbers Accurately’. In Proc. PLDI. White Plains,
-    New York, 1990. <https://doi.org/10.1145/93542.93559>.
-
-    Jaffer, Aubrey. ‘Easy Accurate Reading and Writing of
-    Floating-Point Numbers’. ArXiv:1310.8121 [Cs], 27 October 2013.
-    <http://arxiv.org/abs/1310.8121>.
-
-  [^arbitrary-precision-signedinteger]: **Implementation note.** Be
-    aware when implementing reading and writing of `SignedInteger`s
-    that the data model *requires* arbitrary-precision integers. Your
-    implementation may (but, ideally, should not) truncate precision
-    when reading or writing a `SignedInteger`; however, if it does so,
-    it should (a) signal its client that truncation has occurred, and
-    (b) make it clear to the client that comparing such truncated
-    values for equality or ordering will not yield results that match
-    the expected semantics of the data model.
-
 `String`s are,
 [as in JSON](https://tools.ietf.org/html/rfc8259#section-7), possibly
 escaped text surrounded by double quotes. The escaping rules are the
@@ -177,62 +127,109 @@ Base64 characters are allowed.
        ByteString =/ "#[" *(ws / base64char) ws "]"
         base64char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "-" / "_" / "="
 
-A `Symbol` may be written in a “bare” form[^cf-sexp-token] so long as
-it conforms to certain restrictions on the characters appearing in the
-symbol. Alternatively, it may be written in a quoted form. The quoted
-form is much the same as the syntax for `String`s, including embedded
-escape syntax, except using a bar or pipe character (`|`) instead of a
-double quote mark.
+A `Symbol` may be written in either of two forms.
 
-            Symbol = symstart *symcont / "|" *symchar "|"
-          symstart = ALPHA / sympunct / symustart
-           symcont = ALPHA / sympunct / symustart / symucont / DIGIT / "-"
-          sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
-                     "?" / "_" / "=" / "+" / "/" / "."
+The first is a quoted form, much the same as the syntax for `String`s,
+including embedded escape syntax, except using a bar or pipe character
+(`|`) instead of a double quote mark.
+
+      QuotedSymbol = "|" *symchar "|"
            symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG)
-         symustart = <any code point greater than 127 whose Unicode
-                      category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me,
-                      Pc, Po, Sc, Sm, Sk, So, or Co>
-          symucont = <any code point greater than 127 whose Unicode
-                      category is Nd, Nl, No, or Pd>
+
+Alternatively, a `Symbol` may be written in a “bare” form[^cf-sexp-token].
+The grammar for numeric data is a subset of the grammar for bare `Symbol`s,
+so if a `SymbolOrNumber` also matches the grammar for `Float`, `Double` or
+`SignedInteger`, then it must be interpreted as one of those, and otherwise
+it must be interpreted as a bare `Symbol`.
+
+    SymbolOrNumber = *baresymchar
+       baresymchar = ALPHA / DIGIT / sympunct / symuchar
+          sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" /
+                     "?" / "_" / "=" / "+" / "-" / "/" / "."
+          symuchar = <any code point greater than 127 whose Unicode
+                      category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
+                      Nl, No, Pc, Pd, Po, Sc, Sm, Sk, So, or Co>
 
   [^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt]
     definition of “token representation”, and with the
     [R6RS definition of identifiers](http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2.4).
 
-An `Embedded` is written as a `Value` chosen to represent the denoted
-object, prefixed with `#!`.
+Numeric data follow the [JSON
+grammar](https://tools.ietf.org/html/rfc8259#section-6) except that leading
+zeros are permitted and an optional leading `+` sign is allowed. The
+addition of a trailing “f” distinguishes a `Float` from a `Double` value.
+`Float`s and `Double`s always have either a fractional part or an exponent
+part, where `SignedInteger`s never have
+either.[^reading-and-writing-floats-accurately]
+[^arbitrary-precision-signedinteger]
+
+             Float = flt %i"f"
+            Double = flt
+     SignedInteger = int
+
+          digit1-9 = %x31-39
+               nat = 1*DIGIT
+               int = ["-"/"+"] nat
+              frac = "." 1*DIGIT
+               exp = %i"e" ["-"/"+"] 1*DIGIT
+               flt = int (frac exp / frac / exp)
+
+  [^reading-and-writing-floats-accurately]: **Implementation note.**
+    Your language's standard library likely has a good routine for
+    converting between decimal notation and IEEE 754 floating-point.
+    However, if not, or if you are interested in the challenges of
+    accurately reading and writing floating point numbers, see the
+    excellent matched pair of 1990 papers by Clinger and Steele &
+    White, and a recent follow-up by Jaffer:
+
+    Clinger, William D. ‘How to Read Floating Point Numbers
+    Accurately’. In Proc. PLDI. White Plains, New York, 1990.
+    <https://doi.org/10.1145/93542.93557>.
+
+    Steele, Guy L., Jr., and Jon L. White. ‘How to Print
+    Floating-Point Numbers Accurately’. In Proc. PLDI. White Plains,
+    New York, 1990. <https://doi.org/10.1145/93542.93559>.
+
+    Jaffer, Aubrey. ‘Easy Accurate Reading and Writing of
+    Floating-Point Numbers’. ArXiv:1310.8121 [Cs], 27 October 2013.
+    <http://arxiv.org/abs/1310.8121>.
+
+  [^arbitrary-precision-signedinteger]: **Implementation note.** Be
+    aware when implementing reading and writing of `SignedInteger`s
+    that the data model *requires* arbitrary-precision integers. Your
+    implementation may (but, ideally, should not) truncate precision
+    when reading or writing a `SignedInteger`; however, if it does so,
+    it should (a) signal its client that truncation has occurred, and
+    (b) make it clear to the client that comparing such truncated
+    values for equality or ordering will not yield results that match
+    the expected semantics of the data model.
+
+Some valid IEEE 754 `Float`s and `Double`s are not covered by the grammar
+above, namely, the several million NaNs and the two infinities. These are
+represented as raw hexadecimal strings similar to hexadecimal
+`ByteString`s. Implementations are free to use hexadecimal floating-point
+syntax whereever convenient, even for values representable using the
+grammar above.[^rationale-no-general-machine-syntax]
+
+            Float =/ "#xf" %x22 8HEXDIG %x22
+           Double =/ "#xd" %x22 16HEXDIG %x22
+
+  [^rationale-no-general-machine-syntax]: **Rationale.** Previous versions
+    of this specification included an escape to the [machine-oriented
+    binary syntax](preserves-binary.html) by prefixing a `ByteString`
+    containing the binary representation of the `Value` with `#=`. The only
+    true need for this feature was to represent otherwise-unrepresentable
+    floating-point values. Instead, this specification allows such
+    floating-point values to be written directly. Removing the `#=` syntax
+    simplifies implementations (there is no longer any need to support the
+    machine-oriented syntax) and avoids complications around treatment of
+    annotations potentially contained within machine-encoded values.
+
+Finally, an `Embedded` is written as a `Value` chosen to represent the
+denoted object, prefixed with `#!`.
 
            Embedded = "#!" Value
 
-Finally, any `Value` may be represented by escaping from the textual
-syntax to the [machine-oriented binary syntax](preserves-binary.html)
-by prefixing a `ByteString` containing the binary representation of the
-`Value` with `#=`.[^rationale-switch-to-binary]
-[^no-literal-binary-in-text] [^machine-value-annotations]
-
-           Machine = "#=" ws ByteString
-
-  [^rationale-switch-to-binary]: **Rationale.** The textual syntax
-    cannot express every `Value`: specifically, it cannot express the
-    several million floating-point NaNs, or the two floating-point
-    Infinities. Since the machine-oriented binary format for `Value`s
-    expresses each `Value` with precision, embedding binary `Value`s
-    solves the problem.
-
-  [^no-literal-binary-in-text]: Every text is ultimately physically
-    stored as bytes; therefore, it might seem possible to escape to the
-    raw form of binary encoding from within a piece of textual syntax.
-    However, while bytes must be involved in any *representation* of
-    text, the text *itself* is logically a sequence of *code points* and
-    is not *intrinsically* a binary structure at all. It would be
-    incoherent to expect to be able to access the representation of the
-    text from within the text itself.
-
-  [^machine-value-annotations]: Any text-syntax annotations preceding
-    the `#` are prepended to any binary-syntax annotations yielded by
-    decoding the `ByteString`.
-
 ## Annotations
 
 When written down, a `Value` may have an associated sequence of
diff --git a/preserves.md b/preserves.md
index 411d0b7..8e66ab7 100644
--- a/preserves.md
+++ b/preserves.md
@@ -220,21 +220,23 @@ The total ordering specified [above](#total-order) means that the following stat
 <!-- TODO: Give some examples of large and small Preserves, perhaps -->
 <!-- translated from various JSON blobs floating around the internet. -->
 
-| Value                       | Encoded byte sequence                                                           |
-|-----------------------------|---------------------------------------------------------------------------------|
-| `<capture <discard>>`       | B4 B3 07 'c' 'a' 'p' 't' 'u' 'r' 'e' B4 B3 07 'd' 'i' 's' 'c' 'a' 'r' 'd' 84 84 |
-| `[1 2 3 4]`                 | B5 91 92 93 94 84                                                               |
-| `[-2 -1 0 1]`               | B5 9E 9F 90 91 84                                                               |
-| `"hello"` (format B)        | B1 05 'h' 'e' 'l' 'l' 'o'                                                       |
-| `["a" b #"c" [] #{} #t #f]` | B5 B1 01 'a' B3 01 'b' B2 01 'c' B5 84 B6 84 81 80 84                           |
-| `-257`                      | A1 FE FF                                                                        |
-| `-1`                        | 9F                                                                              |
-| `0`                         | 90                                                                              |
-| `1`                         | 91                                                                              |
-| `255`                       | A1 00 FF                                                                        |
-| `1.0f`                      | 82 3F 80 00 00                                                                  |
-| `1.0`                       | 83 3F F0 00 00 00 00 00 00                                                      |
-| `-1.202e300`                | 83 FE 3C B7 B7 59 BF 04 26                                                      |
+| Value                                               | Encoded byte sequence                                                           |
+|-----------------------------------------------------|---------------------------------------------------------------------------------|
+| `<capture <discard>>`                               | B4 B3 07 'c' 'a' 'p' 't' 'u' 'r' 'e' B4 B3 07 'd' 'i' 's' 'c' 'a' 'r' 'd' 84 84 |
+| `[1 2 3 4]`                                         | B5 91 92 93 94 84                                                               |
+| `[-2 -1 0 1]`                                       | B5 9E 9F 90 91 84                                                               |
+| `"hello"` (format B)                                | B1 05 'h' 'e' 'l' 'l' 'o'                                                       |
+| `["a" b #"c" [] #{} #t #f]`                         | B5 B1 01 'a' B3 01 'b' B2 01 'c' B5 84 B6 84 81 80 84                           |
+| `-257`                                              | A1 FE FF                                                                        |
+| `-1`                                                | 9F                                                                              |
+| `0`                                                 | 90                                                                              |
+| `1`                                                 | 91                                                                              |
+| `255`                                               | A1 00 FF                                                                        |
+| `1.0f`                                              | 82 3F 80 00 00                                                                  |
+| `1.0`                                               | 83 3F F0 00 00 00 00 00 00                                                      |
+| `-1.202e300`                                        | 83 FE 3C B7 B7 59 BF 04 26                                                      |
+| `#xf"7f800000"`, positive `Float` infinity          | 82 7F 80 00 00                                                                  |
+| `#xd"fff0000000000000"`, negative `Double` infinity | 83 FF F0 00 00 00 00 00 00                                                      |
 
 The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`