From 2d3df32749da7c601fb7977c9f6214863864ef10 Mon Sep 17 00:00:00 2001 From: Tony Garnock-Jones Date: Sun, 6 Nov 2022 22:27:01 +0100 Subject: [PATCH] Update spec text for numbers/symbols --- preserves-text.md | 193 +++++++++++++++++++++++----------------------- preserves.md | 32 ++++---- 2 files changed, 112 insertions(+), 113 deletions(-) diff --git a/preserves-text.md b/preserves-text.md index e6a77c0..1fe6fc7 100644 --- a/preserves-text.md +++ b/preserves-text.md @@ -40,10 +40,9 @@ Standalone documents may have trailing whitespace. Any `Value` may be preceded by whitespace. - Value = ws (Record / Collection / Atom / Embedded / Machine) + Value = ws (Record / Collection / Atom / Embedded) Collection = Sequence / Dictionary / Set - Atom = Boolean / Float / Double / SignedInteger / - String / ByteString / Symbol + Atom = Boolean / String / ByteString / QuotedSymbol / SymbolOrNumber Each `Record` is an angle-bracket enclosed grouping of its label-`Value` followed by its field-`Value`s. @@ -73,55 +72,6 @@ false, respectively. Boolean = %s"#t" / %s"#f" -Numeric data follow the -[JSON grammar](https://tools.ietf.org/html/rfc8259#section-6), with -the addition of a trailing “f” distinguishing `Float` from `Double` -values. `Float`s and `Double`s always have either a fractional part or -an exponent part, where `SignedInteger`s never have -either.[^reading-and-writing-floats-accurately] -[^arbitrary-precision-signedinteger] - - Float = flt %i"f" - Double = flt - SignedInteger = int - - digit1-9 = %x31-39 - nat = %x30 / ( digit1-9 *DIGIT ) - int = ["-"] nat - frac = "." 1*DIGIT - exp = %i"e" ["-"/"+"] 1*DIGIT - flt = int (frac exp / frac / exp) - - [^reading-and-writing-floats-accurately]: **Implementation note.** - Your language's standard library likely has a good routine for - converting between decimal notation and IEEE 754 floating-point. - However, if not, or if you are interested in the challenges of - accurately reading and writing floating point numbers, see the - excellent matched pair of 1990 papers by Clinger and Steele & - White, and a recent follow-up by Jaffer: - - Clinger, William D. ‘How to Read Floating Point Numbers - Accurately’. In Proc. PLDI. White Plains, New York, 1990. - . - - Steele, Guy L., Jr., and Jon L. White. ‘How to Print - Floating-Point Numbers Accurately’. In Proc. PLDI. White Plains, - New York, 1990. . - - Jaffer, Aubrey. ‘Easy Accurate Reading and Writing of - Floating-Point Numbers’. ArXiv:1310.8121 [Cs], 27 October 2013. - . - - [^arbitrary-precision-signedinteger]: **Implementation note.** Be - aware when implementing reading and writing of `SignedInteger`s - that the data model *requires* arbitrary-precision integers. Your - implementation may (but, ideally, should not) truncate precision - when reading or writing a `SignedInteger`; however, if it does so, - it should (a) signal its client that truncation has occurred, and - (b) make it clear to the client that comparing such truncated - values for equality or ordering will not yield results that match - the expected semantics of the data model. - `String`s are, [as in JSON](https://tools.ietf.org/html/rfc8259#section-7), possibly escaped text surrounded by double quotes. The escaping rules are the @@ -177,62 +127,109 @@ Base64 characters are allowed. ByteString =/ "#[" *(ws / base64char) ws "]" base64char = %x41-5A / %x61-7A / %x30-39 / "+" / "/" / "-" / "_" / "=" -A `Symbol` may be written in a “bare” form[^cf-sexp-token] so long as -it conforms to certain restrictions on the characters appearing in the -symbol. Alternatively, it may be written in a quoted form. The quoted -form is much the same as the syntax for `String`s, including embedded -escape syntax, except using a bar or pipe character (`|`) instead of a -double quote mark. +A `Symbol` may be written in either of two forms. - Symbol = symstart *symcont / "|" *symchar "|" - symstart = ALPHA / sympunct / symustart - symcont = ALPHA / sympunct / symustart / symucont / DIGIT / "-" - sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" / - "?" / "_" / "=" / "+" / "/" / "." +The first is a quoted form, much the same as the syntax for `String`s, +including embedded escape syntax, except using a bar or pipe character +(`|`) instead of a double quote mark. + + QuotedSymbol = "|" *symchar "|" symchar = unescaped / %x22 / escape (escaped / %x7C / %s"u" 4HEXDIG) - symustart = - symucont = + +Alternatively, a `Symbol` may be written in a “bare” form[^cf-sexp-token]. +The grammar for numeric data is a subset of the grammar for bare `Symbol`s, +so if a `SymbolOrNumber` also matches the grammar for `Float`, `Double` or +`SignedInteger`, then it must be interpreted as one of those, and otherwise +it must be interpreted as a bare `Symbol`. + + SymbolOrNumber = *baresymchar + baresymchar = ALPHA / DIGIT / sympunct / symuchar + sympunct = "~" / "!" / "$" / "%" / "^" / "&" / "*" / + "?" / "_" / "=" / "+" / "-" / "/" / "." + symuchar = [^cf-sexp-token]: Compare with the [SPKI S-expression][sexp.txt] definition of “token representation”, and with the [R6RS definition of identifiers](http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2.4). -An `Embedded` is written as a `Value` chosen to represent the denoted -object, prefixed with `#!`. +Numeric data follow the [JSON +grammar](https://tools.ietf.org/html/rfc8259#section-6) except that leading +zeros are permitted and an optional leading `+` sign is allowed. The +addition of a trailing “f” distinguishes a `Float` from a `Double` value. +`Float`s and `Double`s always have either a fractional part or an exponent +part, where `SignedInteger`s never have +either.[^reading-and-writing-floats-accurately] +[^arbitrary-precision-signedinteger] + + Float = flt %i"f" + Double = flt + SignedInteger = int + + digit1-9 = %x31-39 + nat = 1*DIGIT + int = ["-"/"+"] nat + frac = "." 1*DIGIT + exp = %i"e" ["-"/"+"] 1*DIGIT + flt = int (frac exp / frac / exp) + + [^reading-and-writing-floats-accurately]: **Implementation note.** + Your language's standard library likely has a good routine for + converting between decimal notation and IEEE 754 floating-point. + However, if not, or if you are interested in the challenges of + accurately reading and writing floating point numbers, see the + excellent matched pair of 1990 papers by Clinger and Steele & + White, and a recent follow-up by Jaffer: + + Clinger, William D. ‘How to Read Floating Point Numbers + Accurately’. In Proc. PLDI. White Plains, New York, 1990. + . + + Steele, Guy L., Jr., and Jon L. White. ‘How to Print + Floating-Point Numbers Accurately’. In Proc. PLDI. White Plains, + New York, 1990. . + + Jaffer, Aubrey. ‘Easy Accurate Reading and Writing of + Floating-Point Numbers’. ArXiv:1310.8121 [Cs], 27 October 2013. + . + + [^arbitrary-precision-signedinteger]: **Implementation note.** Be + aware when implementing reading and writing of `SignedInteger`s + that the data model *requires* arbitrary-precision integers. Your + implementation may (but, ideally, should not) truncate precision + when reading or writing a `SignedInteger`; however, if it does so, + it should (a) signal its client that truncation has occurred, and + (b) make it clear to the client that comparing such truncated + values for equality or ordering will not yield results that match + the expected semantics of the data model. + +Some valid IEEE 754 `Float`s and `Double`s are not covered by the grammar +above, namely, the several million NaNs and the two infinities. These are +represented as raw hexadecimal strings similar to hexadecimal +`ByteString`s. Implementations are free to use hexadecimal floating-point +syntax whereever convenient, even for values representable using the +grammar above.[^rationale-no-general-machine-syntax] + + Float =/ "#xf" %x22 8HEXDIG %x22 + Double =/ "#xd" %x22 16HEXDIG %x22 + + [^rationale-no-general-machine-syntax]: **Rationale.** Previous versions + of this specification included an escape to the [machine-oriented + binary syntax](preserves-binary.html) by prefixing a `ByteString` + containing the binary representation of the `Value` with `#=`. The only + true need for this feature was to represent otherwise-unrepresentable + floating-point values. Instead, this specification allows such + floating-point values to be written directly. Removing the `#=` syntax + simplifies implementations (there is no longer any need to support the + machine-oriented syntax) and avoids complications around treatment of + annotations potentially contained within machine-encoded values. + +Finally, an `Embedded` is written as a `Value` chosen to represent the +denoted object, prefixed with `#!`. Embedded = "#!" Value -Finally, any `Value` may be represented by escaping from the textual -syntax to the [machine-oriented binary syntax](preserves-binary.html) -by prefixing a `ByteString` containing the binary representation of the -`Value` with `#=`.[^rationale-switch-to-binary] -[^no-literal-binary-in-text] [^machine-value-annotations] - - Machine = "#=" ws ByteString - - [^rationale-switch-to-binary]: **Rationale.** The textual syntax - cannot express every `Value`: specifically, it cannot express the - several million floating-point NaNs, or the two floating-point - Infinities. Since the machine-oriented binary format for `Value`s - expresses each `Value` with precision, embedding binary `Value`s - solves the problem. - - [^no-literal-binary-in-text]: Every text is ultimately physically - stored as bytes; therefore, it might seem possible to escape to the - raw form of binary encoding from within a piece of textual syntax. - However, while bytes must be involved in any *representation* of - text, the text *itself* is logically a sequence of *code points* and - is not *intrinsically* a binary structure at all. It would be - incoherent to expect to be able to access the representation of the - text from within the text itself. - - [^machine-value-annotations]: Any text-syntax annotations preceding - the `#` are prepended to any binary-syntax annotations yielded by - decoding the `ByteString`. - ## Annotations When written down, a `Value` may have an associated sequence of diff --git a/preserves.md b/preserves.md index 411d0b7..8e66ab7 100644 --- a/preserves.md +++ b/preserves.md @@ -220,21 +220,23 @@ The total ordering specified [above](#total-order) means that the following stat -| Value | Encoded byte sequence | -|-----------------------------|---------------------------------------------------------------------------------| -| `>` | B4 B3 07 'c' 'a' 'p' 't' 'u' 'r' 'e' B4 B3 07 'd' 'i' 's' 'c' 'a' 'r' 'd' 84 84 | -| `[1 2 3 4]` | B5 91 92 93 94 84 | -| `[-2 -1 0 1]` | B5 9E 9F 90 91 84 | -| `"hello"` (format B) | B1 05 'h' 'e' 'l' 'l' 'o' | -| `["a" b #"c" [] #{} #t #f]` | B5 B1 01 'a' B3 01 'b' B2 01 'c' B5 84 B6 84 81 80 84 | -| `-257` | A1 FE FF | -| `-1` | 9F | -| `0` | 90 | -| `1` | 91 | -| `255` | A1 00 FF | -| `1.0f` | 82 3F 80 00 00 | -| `1.0` | 83 3F F0 00 00 00 00 00 00 | -| `-1.202e300` | 83 FE 3C B7 B7 59 BF 04 26 | +| Value | Encoded byte sequence | +|-----------------------------------------------------|---------------------------------------------------------------------------------| +| `>` | B4 B3 07 'c' 'a' 'p' 't' 'u' 'r' 'e' B4 B3 07 'd' 'i' 's' 'c' 'a' 'r' 'd' 84 84 | +| `[1 2 3 4]` | B5 91 92 93 94 84 | +| `[-2 -1 0 1]` | B5 9E 9F 90 91 84 | +| `"hello"` (format B) | B1 05 'h' 'e' 'l' 'l' 'o' | +| `["a" b #"c" [] #{} #t #f]` | B5 B1 01 'a' B3 01 'b' B2 01 'c' B5 84 B6 84 81 80 84 | +| `-257` | A1 FE FF | +| `-1` | 9F | +| `0` | 90 | +| `1` | 91 | +| `255` | A1 00 FF | +| `1.0f` | 82 3F 80 00 00 | +| `1.0` | 83 3F F0 00 00 00 00 00 00 | +| `-1.202e300` | 83 FE 3C B7 B7 59 BF 04 26 | +| `#xf"7f800000"`, positive `Float` infinity | 82 7F 80 00 00 | +| `#xd"fff0000000000000"`, negative `Double` infinity | 83 FF F0 00 00 00 00 00 00 | The next example uses a non-`Symbol` label for a record.[^extensibility2] The `Record`