forked from syndicate-lang/preserves
Minor print layout tweaks, and minor content fixes
This commit is contained in:
parent
80fb72f782
commit
f38aac1e19
78
preserve.md
78
preserve.md
|
@ -5,12 +5,15 @@
|
|||
body { font-family: palatino, "Palatino Linotype", "Palatino LT STD", "URW Palladio L", "TeX Gyre Pagella", serif; }
|
||||
@media screen {
|
||||
body { padding-top: 2rem; max-width: 40em; margin: auto; font-size: 120%; }
|
||||
hr { display: none; }
|
||||
}
|
||||
@media print {
|
||||
@page { margin: 1.5cm; }
|
||||
@page { margin: 4rem 4rem 4.333rem 3rem; }
|
||||
body { margin-left: 2rem; margin-right 2rem; }
|
||||
h1, h2 { page-break-before: always }
|
||||
h1, h2 { page-break-before: always; margin-top: 0; }
|
||||
h1:first-of-type, h2:first-of-type { page-break-before: auto; }
|
||||
hr+* { page-break-before: always; margin-top: 0; }
|
||||
hr { display: none; }
|
||||
}
|
||||
h1, h2, h3, h4, h5, h6 { margin-left: -1rem; color: #4f81bd; }
|
||||
h2 { border-bottom: solid #4f81bd 1px; }
|
||||
|
@ -41,9 +44,9 @@ Preserves also supports the usual suite of atomic and compound data
|
|||
types, in particular including *binary* data as a distinct type from
|
||||
text strings.
|
||||
|
||||
Finally, Preserves defines precisely how to compare two values with
|
||||
each other in terms of the data model, not in terms of syntax or of
|
||||
the data structures of any particular implementation language.
|
||||
Finally, Preserves defines precisely how to *compare* two values.
|
||||
Comparison is based on the data model, not on syntax or on data
|
||||
structures of any particular implementation language.
|
||||
|
||||
[^macro-expressiveness]: By "expressive" I mean *macro-expressive*
|
||||
in the sense of Felleisen's 1991 paper, "On the Expressive Power
|
||||
|
@ -66,6 +69,9 @@ definition of the *values* that we want to work with and give them
|
|||
meaning independent of their syntax. We will treat syntax separately,
|
||||
later in this document.
|
||||
|
||||
Our `Value`s fall into two broad categories: *atomic* and *compound*
|
||||
data.
|
||||
|
||||
Value = Atom
|
||||
| Compound
|
||||
|
||||
|
@ -82,14 +88,6 @@ later in this document.
|
|||
| Set
|
||||
| Dictionary
|
||||
|
||||
Our `Value`s fall into two broad categories: *atomic* and *compound*
|
||||
data.[^inspiration]
|
||||
|
||||
[^inspiration]: This design was loosely inspired by S-expressions,
|
||||
as seen in Lisp, Scheme, [SPKI/SDSI][sexp.txt], and many others,
|
||||
as well as by the ML type system, as seen in languages such as
|
||||
SML, OCaml, Haskell, Rust, and many others.
|
||||
|
||||
**Total order.**<a name="total-order"></a> As we go, we will
|
||||
incrementally specify a total order over `Value`s. Two values of the
|
||||
same kind are compared using kind-specific rules. The ordering among
|
||||
|
@ -126,10 +124,10 @@ examples of `SignedInteger`s using standard mathematical notation.
|
|||
### Unicode strings.
|
||||
|
||||
A `String` is a sequence of Unicode
|
||||
[code-point](http://www.unicode.org/glossary/#code_point)s. Two
|
||||
`String`s are compared lexicographically, code-point by
|
||||
[code-point](http://www.unicode.org/glossary/#code_point)s. `String`s
|
||||
are compared lexicographically, code-point by
|
||||
code-point.[^utf8-is-awesome] We will write examples of `String`s as
|
||||
text surrounded by double-quotes “`"`” using a monospace font.
|
||||
text surrounded by quotes “`"`”.
|
||||
|
||||
[^utf8-is-awesome]: Happily, the design of UTF-8 is such that this
|
||||
gives the same result as a lexicographic byte-by-byte comparison
|
||||
|
@ -139,33 +137,27 @@ text surrounded by double-quotes “`"`” using a monospace font.
|
|||
the string containing the three Unicode code-points `z` (0x7A), `水`
|
||||
(0x6C34) and `𝄞` (0x1D11E); `""`, the empty string.
|
||||
|
||||
**Normalization forms.** Unicode defines multiple
|
||||
[normalization forms](http://unicode.org/reports/tr15/) for text. No
|
||||
particular normalization form is required for `String`s;
|
||||
[see below](#normalization-forms).
|
||||
|
||||
### Binary data.
|
||||
|
||||
A `ByteString` is an ordered sequence of zero or more integers in the
|
||||
inclusive range [0..255]. `ByteString`s are compared
|
||||
lexicographically, byte by byte. We will only write examples of
|
||||
`ByteString`s that contain bytes mapping to printable ASCII
|
||||
characters, using “`#"`” as an opening quote mark and “`"`” as a
|
||||
closing quote mark.
|
||||
A `ByteString` is an ordered sequence of zero or more eight-bit bytes.
|
||||
`ByteString`s are compared lexicographically. We will only write
|
||||
examples of `ByteString`s that contain bytes denoting printable ASCII
|
||||
characters, using “`#"`” as an open-quote and “`"`” as a close-quote
|
||||
mark.
|
||||
|
||||
**Examples.** The `ByteString` containing the integers 65, 66 and 67
|
||||
(corresponding to ASCII characters `A`, `B` and `C`) is written as
|
||||
`#"ABC"`. The empty `ByteString` is written as `#""`. **N.B.** Despite
|
||||
appearances, these are *binary* data.
|
||||
|
||||
### Symbols or identifiers.
|
||||
### Symbols.
|
||||
|
||||
Programming languages like Lisp and Prolog frequently use string-like
|
||||
values called *symbols*. Here, a `Symbol` is, like a `String`, a
|
||||
sequence of Unicode code-points, intended to represent an identifier
|
||||
of some kind. `Symbol`s are also compared lexicographically by
|
||||
code-point. We will write examples including only non-empty sequences
|
||||
of non-whitespace characters, using a monospace font without quotation
|
||||
sequence of Unicode code-points representing an identifier of some
|
||||
kind. `Symbol`s are also compared lexicographically by code-point. We
|
||||
will write examples including only non-empty sequences of
|
||||
non-whitespace characters, using a monospace font without quotation
|
||||
marks.
|
||||
|
||||
**Examples.** `hello-world`; `utf8-string`; `exact-integer?`.
|
||||
|
@ -176,8 +168,6 @@ There are exactly two `Boolean` values, “false” and “true”. The
|
|||
“false” value compares less-than the “true” value. We write `#f` for
|
||||
“false”, and `#t` for “true”.
|
||||
|
||||
**Examples.** `#f`; `#t`.
|
||||
|
||||
### IEEE floating-point values.
|
||||
|
||||
A `Float` is a single-precision IEEE 754 floating-point value; a
|
||||
|
@ -345,6 +335,8 @@ representation:[^some-encodings-unused]
|
|||
|
||||
Each specific type of data defines its own rules for this format.
|
||||
|
||||
---
|
||||
|
||||
#### Encoding data of known length (format B)
|
||||
|
||||
A `Repr` where the length of the `Value` to be encoded is variable but
|
||||
|
@ -416,7 +408,7 @@ Applications *SHOULD* prefer the known-length format for encoding
|
|||
#### Application-specific short form for labels
|
||||
|
||||
Any given protocol using Preserves may additionally define an
|
||||
interpretation for `n ∈ {0,1,2}`, mapping each *short form label
|
||||
interpretation for `n`∈{0,1,2}, mapping each *short form label
|
||||
number* `n` to a specific record label. When encoding `m` fields with
|
||||
short form label number `n`, format B becomes
|
||||
|
||||
|
@ -583,7 +575,7 @@ short form label number 0 to label `discard`, 1 to `capture`, and 2 to
|
|||
| `(observe (speak (discard) (capture (discard))))` | A1 B3 75 73 70 65 61 6B 80 91 80 |
|
||||
| `[1 2 3 4]` (format B) | C4 11 12 13 14 |
|
||||
| `[1 2 3 4]` (format C) | 2C 11 12 13 14 3C |
|
||||
| `[-2 -1 0 1]` | C4 1E 1F 40 11 |
|
||||
| `[-2 -1 0 1]` | C4 1E 1F 10 11 |
|
||||
| `"hello"` (format B) | 55 68 65 6C 6C 6F |
|
||||
| `"hello"` (format C, 2 chunks) | 25 52 68 65 53 6C 6C 6F 35 |
|
||||
| `"hello"` (format C, 5 chunks) | 25 52 68 65 52 6C 6C 50 50 51 6F 35 |
|
||||
|
@ -708,20 +700,20 @@ form label number 1 were chosen, the second example above, `(mime
|
|||
text/plain "ABC")`, would be encoded with "92" in place of "B3 74 6D
|
||||
69 6D 65".
|
||||
|
||||
### Text
|
||||
### Unicode normalization forms
|
||||
|
||||
#### Normalization forms
|
||||
|
||||
In order for users to unambiguously signal or require a particular
|
||||
[normalization form](http://unicode.org/reports/tr15/), we define a
|
||||
`NormalizedString`, which is a `Record` labelled with
|
||||
Unicode defines multiple
|
||||
[normalization forms](http://unicode.org/reports/tr15/) for text.
|
||||
While no particular normalization form is required for `String`s,
|
||||
users may need to unambiguously signal or require a particular
|
||||
normalization form. A `NormalizedString` is a `Record` labelled with
|
||||
`unicode-normalization` and having two fields, the first of which is a
|
||||
`Symbol` specifying the normalization form used (e.g. `nfc`, `nfd`,
|
||||
`nfkc`, `nfkd`), and the second of which is a `String` whose
|
||||
underlying code point representation *MUST* be normalized according to
|
||||
the named normalization form.
|
||||
|
||||
#### IRIs (URIs, URLs, URNs, etc.)
|
||||
### IRIs (URIs, URLs, URNs, etc.)
|
||||
|
||||
An `IRI` is a `Record` labelled with `iri` and having one field, a
|
||||
`String` which is the IRI itself and which *MUST* be a valid absolute
|
||||
|
|
Loading…
Reference in New Issue