Minor print layout tweaks, and minor content fixes

This commit is contained in:
Tony Garnock-Jones 2018-09-24 16:08:48 +01:00
parent b656df8632
commit 4716b4f4e9
1 changed files with 35 additions and 43 deletions

View File

@ -5,12 +5,15 @@
body { font-family: palatino, "Palatino Linotype", "Palatino LT STD", "URW Palladio L", "TeX Gyre Pagella", serif; }
@media screen {
body { padding-top: 2rem; max-width: 40em; margin: auto; font-size: 120%; }
hr { display: none; }
}
@media print {
@page { margin: 1.5cm; }
@page { margin: 4rem 4rem 4.333rem 3rem; }
body { margin-left: 2rem; margin-right 2rem; }
h1, h2 { page-break-before: always }
h1, h2 { page-break-before: always; margin-top: 0; }
h1:first-of-type, h2:first-of-type { page-break-before: auto; }
hr+* { page-break-before: always; margin-top: 0; }
hr { display: none; }
}
h1, h2, h3, h4, h5, h6 { margin-left: -1rem; color: #4f81bd; }
h2 { border-bottom: solid #4f81bd 1px; }
@ -41,9 +44,9 @@ Preserves also supports the usual suite of atomic and compound data
types, in particular including *binary* data as a distinct type from
text strings.
Finally, Preserves defines precisely how to compare two values with
each other in terms of the data model, not in terms of syntax or of
the data structures of any particular implementation language.
Finally, Preserves defines precisely how to *compare* two values.
Comparison is based on the data model, not on syntax or on data
structures of any particular implementation language.
[^macro-expressiveness]: By "expressive" I mean *macro-expressive*
in the sense of Felleisen's 1991 paper, "On the Expressive Power
@ -66,6 +69,9 @@ definition of the *values* that we want to work with and give them
meaning independent of their syntax. We will treat syntax separately,
later in this document.
Our `Value`s fall into two broad categories: *atomic* and *compound*
data.
Value = Atom
| Compound
@ -82,14 +88,6 @@ later in this document.
| Set
| Dictionary
Our `Value`s fall into two broad categories: *atomic* and *compound*
data.[^inspiration]
[^inspiration]: This design was loosely inspired by S-expressions,
as seen in Lisp, Scheme, [SPKI/SDSI][sexp.txt], and many others,
as well as by the ML type system, as seen in languages such as
SML, OCaml, Haskell, Rust, and many others.
**Total order.**<a name="total-order"></a> As we go, we will
incrementally specify a total order over `Value`s. Two values of the
same kind are compared using kind-specific rules. The ordering among
@ -126,10 +124,10 @@ examples of `SignedInteger`s using standard mathematical notation.
### Unicode strings.
A `String` is a sequence of Unicode
[code-point](http://www.unicode.org/glossary/#code_point)s. Two
`String`s are compared lexicographically, code-point by
[code-point](http://www.unicode.org/glossary/#code_point)s. `String`s
are compared lexicographically, code-point by
code-point.[^utf8-is-awesome] We will write examples of `String`s as
text surrounded by double-quotes “`"`” using a monospace font.
text surrounded by quotes “`"`”.
[^utf8-is-awesome]: Happily, the design of UTF-8 is such that this
gives the same result as a lexicographic byte-by-byte comparison
@ -139,33 +137,27 @@ text surrounded by double-quotes “`"`” using a monospace font.
the string containing the three Unicode code-points `z` (0x7A), `水`
(0x6C34) and `𝄞` (0x1D11E); `""`, the empty string.
**Normalization forms.** Unicode defines multiple
[normalization forms](http://unicode.org/reports/tr15/) for text. No
particular normalization form is required for `String`s;
[see below](#normalization-forms).
### Binary data.
A `ByteString` is an ordered sequence of zero or more integers in the
inclusive range [0..255]. `ByteString`s are compared
lexicographically, byte by byte. We will only write examples of
`ByteString`s that contain bytes mapping to printable ASCII
characters, using “`#"`” as an opening quote mark and “`"`” as a
closing quote mark.
A `ByteString` is an ordered sequence of zero or more eight-bit bytes.
`ByteString`s are compared lexicographically. We will only write
examples of `ByteString`s that contain bytes denoting printable ASCII
characters, using “`#"`” as an open-quote and “`"`” as a close-quote
mark.
**Examples.** The `ByteString` containing the integers 65, 66 and 67
(corresponding to ASCII characters `A`, `B` and `C`) is written as
`#"ABC"`. The empty `ByteString` is written as `#""`. **N.B.** Despite
appearances, these are *binary* data.
### Symbols or identifiers.
### Symbols.
Programming languages like Lisp and Prolog frequently use string-like
values called *symbols*. Here, a `Symbol` is, like a `String`, a
sequence of Unicode code-points, intended to represent an identifier
of some kind. `Symbol`s are also compared lexicographically by
code-point. We will write examples including only non-empty sequences
of non-whitespace characters, using a monospace font without quotation
sequence of Unicode code-points representing an identifier of some
kind. `Symbol`s are also compared lexicographically by code-point. We
will write examples including only non-empty sequences of
non-whitespace characters, using a monospace font without quotation
marks.
**Examples.** `hello-world`; `utf8-string`; `exact-integer?`.
@ -176,8 +168,6 @@ There are exactly two `Boolean` values, “false” and “true”. The
“false” value compares less-than the “true” value. We write `#f` for
“false”, and `#t` for “true”.
**Examples.** `#f`; `#t`.
### IEEE floating-point values.
A `Float` is a single-precision IEEE 754 floating-point value; a
@ -345,6 +335,8 @@ representation:[^some-encodings-unused]
Each specific type of data defines its own rules for this format.
---
#### Encoding data of known length (format B)
A `Repr` where the length of the `Value` to be encoded is variable but
@ -416,7 +408,7 @@ Applications *SHOULD* prefer the known-length format for encoding
#### Application-specific short form for labels
Any given protocol using Preserves may additionally define an
interpretation for `n ∈ {0,1,2}`, mapping each *short form label
interpretation for `n`∈{0,1,2}, mapping each *short form label
number* `n` to a specific record label. When encoding `m` fields with
short form label number `n`, format B becomes
@ -583,7 +575,7 @@ short form label number 0 to label `discard`, 1 to `capture`, and 2 to
| `(observe (speak (discard) (capture (discard))))` | A1 B3 75 73 70 65 61 6B 80 91 80 |
| `[1 2 3 4]` (format B) | C4 11 12 13 14 |
| `[1 2 3 4]` (format C) | 2C 11 12 13 14 3C |
| `[-2 -1 0 1]` | C4 1E 1F 40 11 |
| `[-2 -1 0 1]` | C4 1E 1F 10 11 |
| `"hello"` (format B) | 55 68 65 6C 6C 6F |
| `"hello"` (format C, 2 chunks) | 25 52 68 65 53 6C 6C 6F 35 |
| `"hello"` (format C, 5 chunks) | 25 52 68 65 52 6C 6C 50 50 51 6F 35 |
@ -708,20 +700,20 @@ form label number 1 were chosen, the second example above, `(mime
text/plain "ABC")`, would be encoded with "92" in place of "B3 74 6D
69 6D 65".
### Text
### Unicode normalization forms
#### Normalization forms
In order for users to unambiguously signal or require a particular
[normalization form](http://unicode.org/reports/tr15/), we define a
`NormalizedString`, which is a `Record` labelled with
Unicode defines multiple
[normalization forms](http://unicode.org/reports/tr15/) for text.
While no particular normalization form is required for `String`s,
users may need to unambiguously signal or require a particular
normalization form. A `NormalizedString` is a `Record` labelled with
`unicode-normalization` and having two fields, the first of which is a
`Symbol` specifying the normalization form used (e.g. `nfc`, `nfd`,
`nfkc`, `nfkd`), and the second of which is a `String` whose
underlying code point representation *MUST* be normalized according to
the named normalization form.
#### IRIs (URIs, URLs, URNs, etc.)
### IRIs (URIs, URLs, URNs, etc.)
An `IRI` is a `Record` labelled with `iri` and having one field, a
`String` which is the IRI itself and which *MUST* be a valid absolute