From 786134195135bdf4c2e17905bcdea49f198a2efe Mon Sep 17 00:00:00 2001 From: Tony Garnock-Jones Date: Wed, 3 Jul 2019 19:35:56 -0400 Subject: [PATCH] Cosmetic. --- preserves.md | 62 ++++++++++++++++++++++++++-------------------------- 1 file changed, 31 insertions(+), 31 deletions(-) diff --git a/preserves.md b/preserves.md index 83d3bc0..2234e55 100644 --- a/preserves.md +++ b/preserves.md @@ -182,7 +182,7 @@ equivalent compact machine-readable syntax. `null` are all read as `Symbol`s, and that `SignedInteger`s are never read as `Double`s. -### Character set +### Character set. [ABNF][abnf] allows easy definition of US-ASCII-based languages. However, Preserves is a Unicode-based language. Therefore, we @@ -192,7 +192,7 @@ code points. Textual syntax for a `Value` *SHOULD* be encoded using UTF-8 where possible. -### Whitespace +### Whitespace. Whitespace is defined as any number of spaces, tabs, carriage returns, line feeds, or commas. @@ -200,7 +200,7 @@ line feeds, or commas. ws = *(%x20 / %x09 / newline / ",") newline = CR / LF -### Grammar +### Grammar. Standalone documents may have trailing whitespace. @@ -427,7 +427,7 @@ encoded details of the `Value` itself. For a value `v`, we write `[[v]]` for the `Repr` of v. -### Type and Length representation +### Type and Length representation. Each `Repr` takes one of three possible forms: @@ -448,7 +448,7 @@ Each `Repr` takes one of three possible forms: Applications may choose between formats B and C depending on their needs at serialization time. -#### The lead byte +#### The lead byte. Every `Repr` starts with a *lead byte*, constructed by `leadbyte(t,n,m)`, where `t`,`n`∈{0,1,2,3} and 0≤`m`<16: @@ -469,11 +469,11 @@ representation:[^some-encodings-unused] - `t`=2 (format B) represents a `Record`. - `t`=3 (format B) represents a `Sequence`, `Set` or `Dictionary`. -#### Encoding data of fixed length (format A) +#### Encoding data of fixed length (format A). Each specific type of data defines its own rules for this format. -#### Encoding data of known length (format B) +#### Encoding data of known length (format B). A `Repr` where the length of the `Value` to be encoded is variable but known uses the value of `m` in `leadbyte` to encode its length. The @@ -509,7 +509,7 @@ definition, - 300 (binary, grouped into 7-bit chunks, `10 0101100`) varint-encodes to the two bytes 172 and 2. - 1000000000 (binary `11 1011100 1101011 0010100 0000000`) varint-encodes to bytes 128, 148, 235, 220, and 3. -#### Streaming data of unknown length (format C) +#### Streaming data of unknown length (format C). A `Repr` where the length of the `Value` to be encoded is variable and not known at the time serialization of the `Value` starts is encoded @@ -526,7 +526,7 @@ a format B `Repr` of a `ByteString`, no matter the type of the overall For a `Repr` of a `Value` containing other `Value`s, each chunk is to be a single `Repr`. -### Records +### Records. Format B (known length): @@ -542,7 +542,7 @@ Format C (streaming): Applications *SHOULD* prefer the known-length format for encoding `Record`s. -#### Application-specific short form for labels +#### Application-specific short form for labels. Any given protocol using Preserves may additionally define an interpretation for `n`∈{0,1,2}, mapping each *short form label @@ -574,7 +574,7 @@ for format B, or for format C. -### Sequences, Sets and Dictionaries +### Sequences, Sets and Dictionaries. Format B (known length): @@ -618,7 +618,7 @@ order. Note that `header(3,3,m)` and `open(3,3)`/`close(3,3)` are unused and reserved. -### SignedIntegers +### SignedIntegers. Format B/A (known length/fixed-size): @@ -653,7 +653,7 @@ For example, [[ -127 ]] = 41 81 [[ 13 ]] = 41 0D [[ 65536 ]] = 43 01 00 00 [[ -4 ]] = 41 FC [[ 127 ]] = 41 7F [[ 131072 ]] = 43 02 00 00 -### Strings, ByteStrings and Symbols +### Strings, ByteStrings and Symbols. Syntax for these three types varies only in the value of `n` supplied to `header`, `open`, and `close`. In each case, the payload following @@ -676,19 +676,19 @@ then a sequence of zero or more format B chunks, followed by While the overall content of a streamed `String` or `Symbol` must be valid UTF-8, individual chunks do not have to conform to UTF-8. -### Fixed-length Atoms +### Fixed-length Atoms. Fixed-length atoms all use format A, and do not have a length representation. They repurpose the bits that format B `Repr`s use to specify lengths. Applications *MUST NOT* use format C with `open(0,n)` or `close(0,n)` for any `n`. -#### Booleans +#### Booleans. [[ #false ]] = header(0,0,0) = [0x00] [[ #true ]] = header(0,0,1) = [0x01] -#### Floats and Doubles +#### Floats and Doubles. [[ F ]] when F ∈ Float = header(0,0,2) ++ binary32(F) [[ D ]] when D ∈ Double = header(0,0,3) ++ binary64(D) @@ -698,7 +698,7 @@ The functions `binary32(F)` and `binary64(D)` yield big-endian 4- and ## Examples -### Simple examples +### Simple examples. @@ -763,7 +763,7 @@ encodes to --- -### JSON examples +### JSON examples. The examples from [RFC 8259](https://tools.ietf.org/html/rfc8259#section-13) read as @@ -899,7 +899,7 @@ treat them specially. and one which enforces validity (i.e. side-conditions) when reading, writing, or constructing `Value`s. -### MIME-type tagged binary data +### MIME-type tagged binary data. Many internet protocols use [media types](https://tools.ietf.org/html/rfc6838) (a.k.a MIME types) @@ -928,7 +928,7 @@ form label number 1 were chosen, the second example above, `mime(text/plain "ABC")`, would be encoded with "92" in place of "B3 74 6D 69 6D 65". -### Unicode normalization forms +### Unicode normalization forms. Unicode defines multiple [normalization forms](http://unicode.org/reports/tr15/) for text. @@ -941,13 +941,13 @@ normalization form. A `NormalizedString` is a `Record` labelled with underlying code point representation *MUST* be normalized according to the named normalization form. -### IRIs (URIs, URLs, URNs, etc.) +### IRIs (URIs, URLs, URNs, etc.). An `IRI` is a `Record` labelled with `iri` and having one field, a `String` which is the IRI itself and which *MUST* be a valid absolute or relative IRI. -### Machine words +### Machine words. The definition of `SignedInteger` captures all integers. However, in certain circumstances it can be valuable to assert that a number @@ -962,7 +962,7 @@ which *MUST* fall within the appropriate range. That is, to be valid, - in `i16(`*x*`)`, -32768 <= *x* <= 32767. - etc. -### Anonymous Tuples and Unit +### Anonymous Tuples and Unit. A `Tuple` is a `Record` with label `tuple` and zero or more fields, denoting an anonymous tuple of values. @@ -970,14 +970,14 @@ denoting an anonymous tuple of values. The 0-ary tuple, `tuple()`, denotes the empty tuple, sometimes called "unit" or "void" (but *not* e.g. JavaScript's "undefined" value). -### Null and Undefined +### Null and Undefined. Tony Hoare's "[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)" can be represented with the 0-ary `Record` `null()`. An "undefined" value can be represented as `undefined()`. -### Dates and Times +### Dates and Times. Dates, times, moments, and timestamps can be represented with a `Record` with label `rfc3339` having a single field, a `String`, which @@ -1078,7 +1078,7 @@ When designing a language mapping, an important consideration is roundtripping: serialization after deserialization, and vice versa, should both be identities. -### JavaScript +### JavaScript. - `Boolean` ↔ `Boolean` - `Float` and `Double` ↔ numbers @@ -1093,7 +1093,7 @@ should both be identities. - `Set` ↔ `{ "_set": M }` where `M` is a `Map` from the elements of the set to `true` - `Dictionary` ↔ a `Map` -### Scheme/Racket +### Scheme/Racket. - `Boolean` ↔ booleans - `Float` and `Double` ↔ inexact numbers (Racket: single- and double-precision floats) @@ -1106,7 +1106,7 @@ should both be identities. - `Set` ↔ Racket: sets - `Dictionary` ↔ Racket: hash-table -### Java +### Java. - `Boolean` ↔ `Boolean` - `Float` and `Double` ↔ `Float` and `Double` @@ -1120,7 +1120,7 @@ should both be identities. - `Set` ↔ an implementation of `java.util.Set` - `Dictionary` ↔ an implementation of `java.util.Map` -### Erlang +### Erlang. - `Boolean` ↔ `true` and `false` - `Float` and `Double` ↔ floats (unsure how Erlang deals with single-precision) @@ -1143,7 +1143,7 @@ Erlang has no distinct string type, making for a trilemma where `String`s are in danger of clashing with `ByteString`s, `Sequence`s, or `Record`s. -### Python +### Python. - `Boolean` ↔ `True` and `False` - `Float` ↔ a `Float` wrapper-class for a double-precision value @@ -1157,7 +1157,7 @@ or `Record`s. - `Set` ↔ `frozenset` (but accept `set` during encoding) - `Dictionary` ↔ a hashable (immutable) dictionary-like thing (but accept `dict` during encoding) -### Squeak Smalltalk +### Squeak Smalltalk. - `Boolean` ↔ `true` and `false` - `Float` ↔ perhaps a subclass of `Float`?