forked from syndicate-lang/preserves
Minor print layout tweaks, and minor content fixes
This commit is contained in:
parent
80fb72f782
commit
f38aac1e19
78
preserve.md
78
preserve.md
|
@ -5,12 +5,15 @@
|
||||||
body { font-family: palatino, "Palatino Linotype", "Palatino LT STD", "URW Palladio L", "TeX Gyre Pagella", serif; }
|
body { font-family: palatino, "Palatino Linotype", "Palatino LT STD", "URW Palladio L", "TeX Gyre Pagella", serif; }
|
||||||
@media screen {
|
@media screen {
|
||||||
body { padding-top: 2rem; max-width: 40em; margin: auto; font-size: 120%; }
|
body { padding-top: 2rem; max-width: 40em; margin: auto; font-size: 120%; }
|
||||||
|
hr { display: none; }
|
||||||
}
|
}
|
||||||
@media print {
|
@media print {
|
||||||
@page { margin: 1.5cm; }
|
@page { margin: 4rem 4rem 4.333rem 3rem; }
|
||||||
body { margin-left: 2rem; margin-right 2rem; }
|
body { margin-left: 2rem; margin-right 2rem; }
|
||||||
h1, h2 { page-break-before: always }
|
h1, h2 { page-break-before: always; margin-top: 0; }
|
||||||
h1:first-of-type, h2:first-of-type { page-break-before: auto; }
|
h1:first-of-type, h2:first-of-type { page-break-before: auto; }
|
||||||
|
hr+* { page-break-before: always; margin-top: 0; }
|
||||||
|
hr { display: none; }
|
||||||
}
|
}
|
||||||
h1, h2, h3, h4, h5, h6 { margin-left: -1rem; color: #4f81bd; }
|
h1, h2, h3, h4, h5, h6 { margin-left: -1rem; color: #4f81bd; }
|
||||||
h2 { border-bottom: solid #4f81bd 1px; }
|
h2 { border-bottom: solid #4f81bd 1px; }
|
||||||
|
@ -41,9 +44,9 @@ Preserves also supports the usual suite of atomic and compound data
|
||||||
types, in particular including *binary* data as a distinct type from
|
types, in particular including *binary* data as a distinct type from
|
||||||
text strings.
|
text strings.
|
||||||
|
|
||||||
Finally, Preserves defines precisely how to compare two values with
|
Finally, Preserves defines precisely how to *compare* two values.
|
||||||
each other in terms of the data model, not in terms of syntax or of
|
Comparison is based on the data model, not on syntax or on data
|
||||||
the data structures of any particular implementation language.
|
structures of any particular implementation language.
|
||||||
|
|
||||||
[^macro-expressiveness]: By "expressive" I mean *macro-expressive*
|
[^macro-expressiveness]: By "expressive" I mean *macro-expressive*
|
||||||
in the sense of Felleisen's 1991 paper, "On the Expressive Power
|
in the sense of Felleisen's 1991 paper, "On the Expressive Power
|
||||||
|
@ -66,6 +69,9 @@ definition of the *values* that we want to work with and give them
|
||||||
meaning independent of their syntax. We will treat syntax separately,
|
meaning independent of their syntax. We will treat syntax separately,
|
||||||
later in this document.
|
later in this document.
|
||||||
|
|
||||||
|
Our `Value`s fall into two broad categories: *atomic* and *compound*
|
||||||
|
data.
|
||||||
|
|
||||||
Value = Atom
|
Value = Atom
|
||||||
| Compound
|
| Compound
|
||||||
|
|
||||||
|
@ -82,14 +88,6 @@ later in this document.
|
||||||
| Set
|
| Set
|
||||||
| Dictionary
|
| Dictionary
|
||||||
|
|
||||||
Our `Value`s fall into two broad categories: *atomic* and *compound*
|
|
||||||
data.[^inspiration]
|
|
||||||
|
|
||||||
[^inspiration]: This design was loosely inspired by S-expressions,
|
|
||||||
as seen in Lisp, Scheme, [SPKI/SDSI][sexp.txt], and many others,
|
|
||||||
as well as by the ML type system, as seen in languages such as
|
|
||||||
SML, OCaml, Haskell, Rust, and many others.
|
|
||||||
|
|
||||||
**Total order.**<a name="total-order"></a> As we go, we will
|
**Total order.**<a name="total-order"></a> As we go, we will
|
||||||
incrementally specify a total order over `Value`s. Two values of the
|
incrementally specify a total order over `Value`s. Two values of the
|
||||||
same kind are compared using kind-specific rules. The ordering among
|
same kind are compared using kind-specific rules. The ordering among
|
||||||
|
@ -126,10 +124,10 @@ examples of `SignedInteger`s using standard mathematical notation.
|
||||||
### Unicode strings.
|
### Unicode strings.
|
||||||
|
|
||||||
A `String` is a sequence of Unicode
|
A `String` is a sequence of Unicode
|
||||||
[code-point](http://www.unicode.org/glossary/#code_point)s. Two
|
[code-point](http://www.unicode.org/glossary/#code_point)s. `String`s
|
||||||
`String`s are compared lexicographically, code-point by
|
are compared lexicographically, code-point by
|
||||||
code-point.[^utf8-is-awesome] We will write examples of `String`s as
|
code-point.[^utf8-is-awesome] We will write examples of `String`s as
|
||||||
text surrounded by double-quotes “`"`” using a monospace font.
|
text surrounded by quotes “`"`”.
|
||||||
|
|
||||||
[^utf8-is-awesome]: Happily, the design of UTF-8 is such that this
|
[^utf8-is-awesome]: Happily, the design of UTF-8 is such that this
|
||||||
gives the same result as a lexicographic byte-by-byte comparison
|
gives the same result as a lexicographic byte-by-byte comparison
|
||||||
|
@ -139,33 +137,27 @@ text surrounded by double-quotes “`"`” using a monospace font.
|
||||||
the string containing the three Unicode code-points `z` (0x7A), `水`
|
the string containing the three Unicode code-points `z` (0x7A), `水`
|
||||||
(0x6C34) and `𝄞` (0x1D11E); `""`, the empty string.
|
(0x6C34) and `𝄞` (0x1D11E); `""`, the empty string.
|
||||||
|
|
||||||
**Normalization forms.** Unicode defines multiple
|
|
||||||
[normalization forms](http://unicode.org/reports/tr15/) for text. No
|
|
||||||
particular normalization form is required for `String`s;
|
|
||||||
[see below](#normalization-forms).
|
|
||||||
|
|
||||||
### Binary data.
|
### Binary data.
|
||||||
|
|
||||||
A `ByteString` is an ordered sequence of zero or more integers in the
|
A `ByteString` is an ordered sequence of zero or more eight-bit bytes.
|
||||||
inclusive range [0..255]. `ByteString`s are compared
|
`ByteString`s are compared lexicographically. We will only write
|
||||||
lexicographically, byte by byte. We will only write examples of
|
examples of `ByteString`s that contain bytes denoting printable ASCII
|
||||||
`ByteString`s that contain bytes mapping to printable ASCII
|
characters, using “`#"`” as an open-quote and “`"`” as a close-quote
|
||||||
characters, using “`#"`” as an opening quote mark and “`"`” as a
|
mark.
|
||||||
closing quote mark.
|
|
||||||
|
|
||||||
**Examples.** The `ByteString` containing the integers 65, 66 and 67
|
**Examples.** The `ByteString` containing the integers 65, 66 and 67
|
||||||
(corresponding to ASCII characters `A`, `B` and `C`) is written as
|
(corresponding to ASCII characters `A`, `B` and `C`) is written as
|
||||||
`#"ABC"`. The empty `ByteString` is written as `#""`. **N.B.** Despite
|
`#"ABC"`. The empty `ByteString` is written as `#""`. **N.B.** Despite
|
||||||
appearances, these are *binary* data.
|
appearances, these are *binary* data.
|
||||||
|
|
||||||
### Symbols or identifiers.
|
### Symbols.
|
||||||
|
|
||||||
Programming languages like Lisp and Prolog frequently use string-like
|
Programming languages like Lisp and Prolog frequently use string-like
|
||||||
values called *symbols*. Here, a `Symbol` is, like a `String`, a
|
values called *symbols*. Here, a `Symbol` is, like a `String`, a
|
||||||
sequence of Unicode code-points, intended to represent an identifier
|
sequence of Unicode code-points representing an identifier of some
|
||||||
of some kind. `Symbol`s are also compared lexicographically by
|
kind. `Symbol`s are also compared lexicographically by code-point. We
|
||||||
code-point. We will write examples including only non-empty sequences
|
will write examples including only non-empty sequences of
|
||||||
of non-whitespace characters, using a monospace font without quotation
|
non-whitespace characters, using a monospace font without quotation
|
||||||
marks.
|
marks.
|
||||||
|
|
||||||
**Examples.** `hello-world`; `utf8-string`; `exact-integer?`.
|
**Examples.** `hello-world`; `utf8-string`; `exact-integer?`.
|
||||||
|
@ -176,8 +168,6 @@ There are exactly two `Boolean` values, “false” and “true”. The
|
||||||
“false” value compares less-than the “true” value. We write `#f` for
|
“false” value compares less-than the “true” value. We write `#f` for
|
||||||
“false”, and `#t` for “true”.
|
“false”, and `#t` for “true”.
|
||||||
|
|
||||||
**Examples.** `#f`; `#t`.
|
|
||||||
|
|
||||||
### IEEE floating-point values.
|
### IEEE floating-point values.
|
||||||
|
|
||||||
A `Float` is a single-precision IEEE 754 floating-point value; a
|
A `Float` is a single-precision IEEE 754 floating-point value; a
|
||||||
|
@ -345,6 +335,8 @@ representation:[^some-encodings-unused]
|
||||||
|
|
||||||
Each specific type of data defines its own rules for this format.
|
Each specific type of data defines its own rules for this format.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
#### Encoding data of known length (format B)
|
#### Encoding data of known length (format B)
|
||||||
|
|
||||||
A `Repr` where the length of the `Value` to be encoded is variable but
|
A `Repr` where the length of the `Value` to be encoded is variable but
|
||||||
|
@ -416,7 +408,7 @@ Applications *SHOULD* prefer the known-length format for encoding
|
||||||
#### Application-specific short form for labels
|
#### Application-specific short form for labels
|
||||||
|
|
||||||
Any given protocol using Preserves may additionally define an
|
Any given protocol using Preserves may additionally define an
|
||||||
interpretation for `n ∈ {0,1,2}`, mapping each *short form label
|
interpretation for `n`∈{0,1,2}, mapping each *short form label
|
||||||
number* `n` to a specific record label. When encoding `m` fields with
|
number* `n` to a specific record label. When encoding `m` fields with
|
||||||
short form label number `n`, format B becomes
|
short form label number `n`, format B becomes
|
||||||
|
|
||||||
|
@ -583,7 +575,7 @@ short form label number 0 to label `discard`, 1 to `capture`, and 2 to
|
||||||
| `(observe (speak (discard) (capture (discard))))` | A1 B3 75 73 70 65 61 6B 80 91 80 |
|
| `(observe (speak (discard) (capture (discard))))` | A1 B3 75 73 70 65 61 6B 80 91 80 |
|
||||||
| `[1 2 3 4]` (format B) | C4 11 12 13 14 |
|
| `[1 2 3 4]` (format B) | C4 11 12 13 14 |
|
||||||
| `[1 2 3 4]` (format C) | 2C 11 12 13 14 3C |
|
| `[1 2 3 4]` (format C) | 2C 11 12 13 14 3C |
|
||||||
| `[-2 -1 0 1]` | C4 1E 1F 40 11 |
|
| `[-2 -1 0 1]` | C4 1E 1F 10 11 |
|
||||||
| `"hello"` (format B) | 55 68 65 6C 6C 6F |
|
| `"hello"` (format B) | 55 68 65 6C 6C 6F |
|
||||||
| `"hello"` (format C, 2 chunks) | 25 52 68 65 53 6C 6C 6F 35 |
|
| `"hello"` (format C, 2 chunks) | 25 52 68 65 53 6C 6C 6F 35 |
|
||||||
| `"hello"` (format C, 5 chunks) | 25 52 68 65 52 6C 6C 50 50 51 6F 35 |
|
| `"hello"` (format C, 5 chunks) | 25 52 68 65 52 6C 6C 50 50 51 6F 35 |
|
||||||
|
@ -708,20 +700,20 @@ form label number 1 were chosen, the second example above, `(mime
|
||||||
text/plain "ABC")`, would be encoded with "92" in place of "B3 74 6D
|
text/plain "ABC")`, would be encoded with "92" in place of "B3 74 6D
|
||||||
69 6D 65".
|
69 6D 65".
|
||||||
|
|
||||||
### Text
|
### Unicode normalization forms
|
||||||
|
|
||||||
#### Normalization forms
|
Unicode defines multiple
|
||||||
|
[normalization forms](http://unicode.org/reports/tr15/) for text.
|
||||||
In order for users to unambiguously signal or require a particular
|
While no particular normalization form is required for `String`s,
|
||||||
[normalization form](http://unicode.org/reports/tr15/), we define a
|
users may need to unambiguously signal or require a particular
|
||||||
`NormalizedString`, which is a `Record` labelled with
|
normalization form. A `NormalizedString` is a `Record` labelled with
|
||||||
`unicode-normalization` and having two fields, the first of which is a
|
`unicode-normalization` and having two fields, the first of which is a
|
||||||
`Symbol` specifying the normalization form used (e.g. `nfc`, `nfd`,
|
`Symbol` specifying the normalization form used (e.g. `nfc`, `nfd`,
|
||||||
`nfkc`, `nfkd`), and the second of which is a `String` whose
|
`nfkc`, `nfkd`), and the second of which is a `String` whose
|
||||||
underlying code point representation *MUST* be normalized according to
|
underlying code point representation *MUST* be normalized according to
|
||||||
the named normalization form.
|
the named normalization form.
|
||||||
|
|
||||||
#### IRIs (URIs, URLs, URNs, etc.)
|
### IRIs (URIs, URLs, URNs, etc.)
|
||||||
|
|
||||||
An `IRI` is a `Record` labelled with `iri` and having one field, a
|
An `IRI` is a `Record` labelled with `iri` and having one field, a
|
||||||
`String` which is the IRI itself and which *MUST* be a valid absolute
|
`String` which is the IRI itself and which *MUST* be a valid absolute
|
||||||
|
|
Loading…
Reference in New Issue