Delete misleading, incorrect, or unnecessary text

2018-11-08 12:35:50 +00:00 · 2018-11-08 12:35:50 +00:00 · 10d8ce5c09
parent cf250b9245
commit 10d8ce5c09
1 changed files with 32 additions and 112 deletions
--- a/preserves.md
+++ b/preserves.md
@ -6,7 +6,7 @@
 # Preserves: an Expressive Data Language
 Tony Garnock-Jones <tonyg@leastfixedpoint.com>  
-September 2018. Version 0.0.3.
+November 2018. Version 0.0.4.
  [sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
  [spki]: http://world.std.com/~cme/html/spki.html
@ -17,10 +17,11 @@ September 2018. Version 0.0.3.
 This document proposes a data model and serialization format called
 *Preserves*.
-Preserves supports *records* with user-defined *labels*. This makes it
+Preserves supports *records* with user-defined *labels*. This relieves
-more expressive[^macro-expressiveness] than most data languages in use
+the confusion caused by encoding records as dictionaries, seen in most
-on the web and allows it to easily represent the *labelled sums of
+data languages in use on the web. It also allows Preserves to easily
-products* as seen in many functional programming languages.
+represent the *labelled sums of products* as seen in many functional
 programming languages.
 Preserves also supports the usual suite of atomic and compound data
 types, in particular including *binary* data as a distinct type from
@ -30,27 +31,11 @@ Finally, Preserves defines precisely how to *compare* two values.
 Comparison is based on the data model, not on syntax or on data
 structures of any particular implementation language.
  [^macro-expressiveness]: By "expressive" I mean *macro-expressive*
    in the sense of Felleisen's 1991 paper, "On the Expressive Power
    of Programming Languages".
    Roughly speaking, there's no way in a JSON document to introduce a
    new kind of information (such as binary data, or a date-stamp, or
    a "person" object) in an *unambiguous way* without *global
    agreement* from every potential consumer of the document. With an
    extensible labelled record type, there is.
    Felleisen, Matthias. “On the Expressive Power of Programming
    Languages.” Science of Computer Programming 17, no. 1--3 (1991):
    35–75.
 ## Starting with Semantics
 Taking inspiration from functional programming, we start with a
 definition of the *values* that we want to work with and give them
-meaning independent of their syntax. When we write examples of values,
+meaning independent of their syntax.
 we will do so using the [textual syntax](#textual-syntax) defined
 later in this document.
 Our `Value`s fall into two broad categories: *atomic* and *compound*
 data.
@ -98,11 +83,6 @@ neither is less than the other according to the total order.
 A `SignedInteger` is a signed integer of arbitrary width.
 `SignedInteger`s are compared as mathematical integers.
 **Examples.** 10; -6; 0.
 **Non-examples.** NaN (the clue is in the name!); ∞ (not finite); 0.2
 (not an integer); 1/7 (likewise); 2+*i*3 (likewise); √2 (likewise).
 ### Unicode strings.
 A `String` is a sequence of Unicode
@ -114,19 +94,10 @@ code-point.[^utf8-is-awesome]
    gives the same result as a lexicographic byte-by-byte comparison
    of the UTF-8 encoding of a string!
 **Examples.** `"Hello world"`, an eleven-code-point string; `"z水𝄞"`,
 the string containing the three Unicode code-points `z` (0x7A), `水`
 (0x6C34) and `𝄞` (0x1D11E); `""`, the empty string.
 ### Binary data.
-A `ByteString` is an ordered sequence of zero or more eight-bit bytes.
+A `ByteString` is a sequence of octets. `ByteString`s are compared
-`ByteString`s are compared lexicographically.
+lexicographically.
 **Examples.** `#""`, the empty `ByteString`; `#"ABC"`, the
 `ByteString` containing the integers 65, 66 and 67 (corresponding to
 ASCII characters `A`, `B` and `C`). **N.B.** Despite appearances,
 these are *binary* data.
 ### Symbols.
@ -135,40 +106,27 @@ values called *symbols*. Here, a `Symbol` is, like a `String`, a
 sequence of Unicode code-points representing an identifier of some
 kind. `Symbol`s are also compared lexicographically by code-point.
 **Examples.** `hello-world`; `utf8-string`; `exact-integer?`.
 ### Booleans.
-There are exactly two `Boolean` values, “false” and “true”. The
+There are two `Boolean`s, “false” and “true”. The “false” value is
-“false” value compares less-than the “true” value. We write `#false`
+less-than the “true” value.
 for “false”, and `#true` for “true”.
 ### IEEE floating-point values.
-A `Float` is a single-precision IEEE 754 floating-point value; a
+`Float`s and `Double`s are single- and double-precision IEEE 754
-`Double` is a double-precision IEEE 754 floating-point value.
+floating-point values, respectively. `Float`s, `Double`s and
-`Float`s, `Double`s and `SignedInteger`s are considered disjoint, and
+`SignedInteger`s are disjoint; by the rules [above](#total-order),
-so by the rules [above](#total-order), every `Float` is less than
+every `Float` is less than every `Double`, and every `SignedInteger`
-every `Double`, and every `SignedInteger` is greater than both. Two
+is greater than both. Two `Float`s or two `Double`s are to be ordered
-`Float`s or two `Double`s are to be ordered by the `totalOrder`
+by the `totalOrder` predicate defined in section 5.10 of
 predicate defined in section 5.10 of
 [IEEE Std 754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935).
 We write examples using a fractional part and/or an exponent to
 distinguish them from `SignedInteger`s. An additional suffix `f`
 distinguishes `Float`s from `Double`s.
 **Examples.** 10.0f; -6.0; 0.0f; 0.5; -1.202e300.
 **Non-examples.** 10, -6, and 0, because writing them this way
 indicates `SignedInteger`s, not `Float`s or `Double`s.
 ### Records.
-A `Record` is a *labelled* tuple of zero or more `Value`s, called the
+A `Record` is a *labelled* tuple of `Value`s, the record's *fields*. A
-record's *fields*. A record's label is itself a `Value`, though it
+label can be any `Value`, but is usually a `Symbol`.[^extensibility]
-will usually be a `Symbol`.[^extensibility] [^iri-labels] `Record`s
+[^iri-labels] `Record`s are compared lexicographically: first by
-are compared lexicographically as if they were just tuples; that is,
+label, then by field sequence.
 first by their labels, and then by the remainder of their fields.
  [^extensibility]: The [Racket](https://racket-lang.org/) programming
    language defines
@ -186,19 +144,10 @@ first by their labels, and then by the remainder of their fields.
    it cannot be read as an IRI at all, and so the label simply stands
    for itself—for its own `Value`.
 **Examples.** `foo(1 2 3)`, a `Record` with label `foo` and fields 1,
 2 and 3; `void()`, a `Record` with label `void` and no fields.
 **Non-examples.** `()`, because it lacks a label; `void`, because it
 lacks even an empty tuple of fields.
 ### Sequences.
-A `Sequence` is a general-purpose, variable-length ordered sequence of
+A `Sequence` is a sequence of `Value`s. `Sequence`s are compared
-zero or more `Value`s. `Sequence`s are compared lexicographically.
+lexicographically.
 **Examples.** `[]`, the empty sequence; `[1 2 3]`, the sequence of
 `SignedInteger`s 1, 2 and 3.
 ### Sets.
@ -208,40 +157,14 @@ induced by the total order on `Value`s. Two `Set`s are compared by
 sorting their elements ascending using the [total order](#total-order)
 and comparing the resulting `Sequence`s.
 **Examples.** `#set{}`, the empty set; `#set{#set{}}`, the set
 containing only the empty set; `{4 "hello" (void) 9.0f}`, the set
 containing 4, the string `"hello"`, the record with label `void` and
 no fields, and the `Float` denoting the number 9.0; `{1 1.0f}`, the
 set containing a `SignedInteger` and a `Float`; `{mime(application/xml
 #"<x/>") mime(application/xml #"<x />")}`, a set containing two
 different `mime` records.[^mime-xml-difference]
  [^mime-xml-difference]: The two XML documents `<x/>` and `<x />`
    differ by bytewise comparison, and thus yield different record
    values, even though under the semantics of XML they denote
    identical XML infoset.
 **Non-examples.** `{1 1}`, because it contains multiple equivalent
 `Value`s; `{}`, because without the `#set` marker, it denotes the
 empty dictionary.
 ### Dictionaries.
 A `Dictionary` is an unordered finite collection of pairs of `Value`s.
-Each pair comprises a *key* and a *value*. Keys in a `Dictionary` must
+Each pair comprises a *key* and a *value*. Keys in a `Dictionary` are
-be pairwise distinct. Instances of `Dictionary` are compared by
+pairwise distinct. Instances of `Dictionary` are compared by
 lexicographic comparison of the sequences resulting from ordering each
 `Dictionary`'s pairs in ascending order by key.
 **Examples.** `{}`, the empty dictionary; `{a: 1}`, the dictionary
 mapping the `Symbol` `a` to the `SignedInteger` 1; `{[1 2 3]: a}`,
 mapping `[1 2 3]` to `a`; `{"hi": 0, hi: 0, there: []}`, having a
 `String` and two `Symbol` keys, and `SignedInteger` and `Sequence`
 values.
 **Non-examples.** `{a:1 b:2 a:3}`, because it contains duplicate
 keys; `{[7 8]:[] [7 8]:99}`, for the same reason.
 ## Textual Syntax
 Now we have discussed `Value`s and their meanings, we may turn to
@ -282,7 +205,7 @@ or line feed.
 ### Grammar
-Standalone documents containing textual representations of `Value`s may have trailing whitespace.
+Standalone documents may have trailing whitespace.
          Document = Value ws
@ -301,9 +224,9 @@ the label and the open-parenthesis.
 `Sequence`s are enclosed in square brackets. `Dictionary` values are
 curly-brace-enclosed colon-separated pairs of values. `Set`s are
-written either as a simple curly-brace-enclosed non-empty sequence of
+written either as one or more values enclosed in curly braces, or zero
-values, or as a possibly-empty sequence of values enclosed by the
+or more values enclosed by the tokens `#set{` and
-tokens `#set{` and `}`.[^printing-collections]
+`}`.[^printing-collections]
          Sequence = "[" *Value ws "]"
        Dictionary = "{" *(Value ws ":" Value) ws "}"
@ -1325,12 +1248,9 @@ into these types. For example, dates and email addresses are often
 represented as strings with an implicit internal structure.
 There is no convention for *labelling* a value as belonging to a
-particular category. This makes it difficult to extract, say, all
+particular category. Instead, JSON-encoded data are often labelled in
-email addresses, or all URLs, from an arbitrary JSON document.
+an ad-hoc way. Multiple incompatible approaches exist. For example, a
-
+"money" structure containing a `currency` field and an `amount` may be
 Instead, JSON-encoded data are often labelled in an ad-hoc way.
 Multiple incompatible approaches exist. For example, a "money"
 structure containing a `currency` field and an `amount` may be
 represented in any number of ways:
    { "_type": "money", "currency": "EUR", "amount": 10 }