Delete misleading, incorrect, or unnecessary text

2018-11-08 12:35:50 +00:00 · 2018-11-08 12:35:50 +00:00 · 10d8ce5c09
parent cf250b9245
commit 10d8ce5c09
1 changed files with 32 additions and 112 deletions
--- a/preserves.md
+++ b/preserves.md
@ -6,7 +6,7 @@
 # Preserves: an Expressive Data Language

 Tony Garnock-Jones <tonyg@leastfixedpoint.com>  
-September 2018. Version 0.0.3.
+November 2018. Version 0.0.4.

  [sexp.txt]: http://people.csail.mit.edu/rivest/Sexp.txt
  [spki]: http://world.std.com/~cme/html/spki.html
@ -17,10 +17,11 @@ September 2018. Version 0.0.3.
 This document proposes a data model and serialization format called
 *Preserves*.

-Preserves supports *records* with user-defined *labels*. This makes it
-more expressive[^macro-expressiveness] than most data languages in use
-on the web and allows it to easily represent the *labelled sums of
-products* as seen in many functional programming languages.
+Preserves supports *records* with user-defined *labels*. This relieves
+the confusion caused by encoding records as dictionaries, seen in most
+data languages in use on the web. It also allows Preserves to easily
+represent the *labelled sums of products* as seen in many functional
+programming languages.

 Preserves also supports the usual suite of atomic and compound data
 types, in particular including *binary* data as a distinct type from
@ -30,27 +31,11 @@ Finally, Preserves defines precisely how to *compare* two values.
 Comparison is based on the data model, not on syntax or on data
 structures of any particular implementation language.

-  [^macro-expressiveness]: By "expressive" I mean *macro-expressive*
-    in the sense of Felleisen's 1991 paper, "On the Expressive Power
-    of Programming Languages".
-
-    Roughly speaking, there's no way in a JSON document to introduce a
-    new kind of information (such as binary data, or a date-stamp, or
-    a "person" object) in an *unambiguous way* without *global
-    agreement* from every potential consumer of the document. With an
-    extensible labelled record type, there is.
-
-    Felleisen, Matthias. “On the Expressive Power of Programming
-    Languages.” Science of Computer Programming 17, no. 1--3 (1991):
-    35–75.
-
 ## Starting with Semantics

 Taking inspiration from functional programming, we start with a
 definition of the *values* that we want to work with and give them
-meaning independent of their syntax. When we write examples of values,
-we will do so using the [textual syntax](#textual-syntax) defined
-later in this document.
+meaning independent of their syntax.

 Our `Value`s fall into two broad categories: *atomic* and *compound*
 data.
@ -98,11 +83,6 @@ neither is less than the other according to the total order.
 A `SignedInteger` is a signed integer of arbitrary width.
 `SignedInteger`s are compared as mathematical integers.

-**Examples.** 10; -6; 0.
-
-**Non-examples.** NaN (the clue is in the name!); ∞ (not finite); 0.2
-(not an integer); 1/7 (likewise); 2+*i*3 (likewise); √2 (likewise).
-
 ### Unicode strings.

 A `String` is a sequence of Unicode
@ -114,19 +94,10 @@ code-point.[^utf8-is-awesome]
    gives the same result as a lexicographic byte-by-byte comparison
    of the UTF-8 encoding of a string!

-**Examples.** `"Hello world"`, an eleven-code-point string; `"z水𝄞"`,
-the string containing the three Unicode code-points `z` (0x7A), `水`
-(0x6C34) and `𝄞` (0x1D11E); `""`, the empty string.
-
 ### Binary data.

-A `ByteString` is an ordered sequence of zero or more eight-bit bytes.
-`ByteString`s are compared lexicographically.
-
-**Examples.** `#""`, the empty `ByteString`; `#"ABC"`, the
-`ByteString` containing the integers 65, 66 and 67 (corresponding to
-ASCII characters `A`, `B` and `C`). **N.B.** Despite appearances,
-these are *binary* data.
+A `ByteString` is a sequence of octets. `ByteString`s are compared
+lexicographically.

 ### Symbols.

@ -135,40 +106,27 @@ values called *symbols*. Here, a `Symbol` is, like a `String`, a
 sequence of Unicode code-points representing an identifier of some
 kind. `Symbol`s are also compared lexicographically by code-point.

-**Examples.** `hello-world`; `utf8-string`; `exact-integer?`.
-
 ### Booleans.

-There are exactly two `Boolean` values, “false” and “true”. The
-“false” value compares less-than the “true” value. We write `#false`
-for “false”, and `#true` for “true”.
+There are two `Boolean`s, “false” and “true”. The “false” value is
+less-than the “true” value.

 ### IEEE floating-point values.

-A `Float` is a single-precision IEEE 754 floating-point value; a
-`Double` is a double-precision IEEE 754 floating-point value.
-`Float`s, `Double`s and `SignedInteger`s are considered disjoint, and
-so by the rules [above](#total-order), every `Float` is less than
-every `Double`, and every `SignedInteger` is greater than both. Two
-`Float`s or two `Double`s are to be ordered by the `totalOrder`
-predicate defined in section 5.10 of
+`Float`s and `Double`s are single- and double-precision IEEE 754
+floating-point values, respectively. `Float`s, `Double`s and
+`SignedInteger`s are disjoint; by the rules [above](#total-order),
+every `Float` is less than every `Double`, and every `SignedInteger`
+is greater than both. Two `Float`s or two `Double`s are to be ordered
+by the `totalOrder` predicate defined in section 5.10 of
 [IEEE Std 754-2008](https://dx.doi.org/10.1109/IEEESTD.2008.4610935).
-We write examples using a fractional part and/or an exponent to
-distinguish them from `SignedInteger`s. An additional suffix `f`
-distinguishes `Float`s from `Double`s.
-
-**Examples.** 10.0f; -6.0; 0.0f; 0.5; -1.202e300.
-
-**Non-examples.** 10, -6, and 0, because writing them this way
-indicates `SignedInteger`s, not `Float`s or `Double`s.

 ### Records.

-A `Record` is a *labelled* tuple of zero or more `Value`s, called the
-record's *fields*. A record's label is itself a `Value`, though it
-will usually be a `Symbol`.[^extensibility] [^iri-labels] `Record`s
-are compared lexicographically as if they were just tuples; that is,
-first by their labels, and then by the remainder of their fields.
+A `Record` is a *labelled* tuple of `Value`s, the record's *fields*. A
+label can be any `Value`, but is usually a `Symbol`.[^extensibility]
+[^iri-labels] `Record`s are compared lexicographically: first by
+label, then by field sequence.

  [^extensibility]: The [Racket](https://racket-lang.org/) programming
    language defines
@ -186,19 +144,10 @@ first by their labels, and then by the remainder of their fields.
    it cannot be read as an IRI at all, and so the label simply stands
    for itself—for its own `Value`.

-**Examples.** `foo(1 2 3)`, a `Record` with label `foo` and fields 1,
-2 and 3; `void()`, a `Record` with label `void` and no fields.
-
-**Non-examples.** `()`, because it lacks a label; `void`, because it
-lacks even an empty tuple of fields.
-
 ### Sequences.

-A `Sequence` is a general-purpose, variable-length ordered sequence of
-zero or more `Value`s. `Sequence`s are compared lexicographically.
-
-**Examples.** `[]`, the empty sequence; `[1 2 3]`, the sequence of
-`SignedInteger`s 1, 2 and 3.
+A `Sequence` is a sequence of `Value`s. `Sequence`s are compared
+lexicographically.

 ### Sets.

@ -208,40 +157,14 @@ induced by the total order on `Value`s. Two `Set`s are compared by
 sorting their elements ascending using the [total order](#total-order)
 and comparing the resulting `Sequence`s.

-**Examples.** `#set{}`, the empty set; `#set{#set{}}`, the set
-containing only the empty set; `{4 "hello" (void) 9.0f}`, the set
-containing 4, the string `"hello"`, the record with label `void` and
-no fields, and the `Float` denoting the number 9.0; `{1 1.0f}`, the
-set containing a `SignedInteger` and a `Float`; `{mime(application/xml
-#"<x/>") mime(application/xml #"<x />")}`, a set containing two
-different `mime` records.[^mime-xml-difference]
-
-  [^mime-xml-difference]: The two XML documents `<x/>` and `<x />`
-    differ by bytewise comparison, and thus yield different record
-    values, even though under the semantics of XML they denote
-    identical XML infoset.
-
-**Non-examples.** `{1 1}`, because it contains multiple equivalent
-`Value`s; `{}`, because without the `#set` marker, it denotes the
-empty dictionary.
-
 ### Dictionaries.

 A `Dictionary` is an unordered finite collection of pairs of `Value`s.
-Each pair comprises a *key* and a *value*. Keys in a `Dictionary` must
-be pairwise distinct. Instances of `Dictionary` are compared by
+Each pair comprises a *key* and a *value*. Keys in a `Dictionary` are
+pairwise distinct. Instances of `Dictionary` are compared by
 lexicographic comparison of the sequences resulting from ordering each
 `Dictionary`'s pairs in ascending order by key.

-**Examples.** `{}`, the empty dictionary; `{a: 1}`, the dictionary
-mapping the `Symbol` `a` to the `SignedInteger` 1; `{[1 2 3]: a}`,
-mapping `[1 2 3]` to `a`; `{"hi": 0, hi: 0, there: []}`, having a
-`String` and two `Symbol` keys, and `SignedInteger` and `Sequence`
-values.
-
-**Non-examples.** `{a:1 b:2 a:3}`, because it contains duplicate
-keys; `{[7 8]:[] [7 8]:99}`, for the same reason.
-
 ## Textual Syntax

 Now we have discussed `Value`s and their meanings, we may turn to
@ -282,7 +205,7 @@ or line feed.

 ### Grammar

-Standalone documents containing textual representations of `Value`s may have trailing whitespace.
+Standalone documents may have trailing whitespace.

          Document = Value ws

@ -301,9 +224,9 @@ the label and the open-parenthesis.

 `Sequence`s are enclosed in square brackets. `Dictionary` values are
 curly-brace-enclosed colon-separated pairs of values. `Set`s are
-written either as a simple curly-brace-enclosed non-empty sequence of
-values, or as a possibly-empty sequence of values enclosed by the
-tokens `#set{` and `}`.[^printing-collections]
+written either as one or more values enclosed in curly braces, or zero
+or more values enclosed by the tokens `#set{` and
+`}`.[^printing-collections]

          Sequence = "[" *Value ws "]"
        Dictionary = "{" *(Value ws ":" Value) ws "}"
@ -1325,12 +1248,9 @@ into these types. For example, dates and email addresses are often
 represented as strings with an implicit internal structure.

 There is no convention for *labelling* a value as belonging to a
-particular category. This makes it difficult to extract, say, all
-email addresses, or all URLs, from an arbitrary JSON document.
-
-Instead, JSON-encoded data are often labelled in an ad-hoc way.
-Multiple incompatible approaches exist. For example, a "money"
-structure containing a `currency` field and an `amount` may be
+particular category. Instead, JSON-encoded data are often labelled in
+an ad-hoc way. Multiple incompatible approaches exist. For example, a
+"money" structure containing a `currency` field and an `amount` may be
 represented in any number of ways:

    { "_type": "money", "currency": "EUR", "amount": 10 }