2019-08-18 16:51:26 +00:00
|
|
|
---
|
2019-08-18 21:08:55 +00:00
|
|
|
title: "Conventions for Common Data Types"
|
2019-08-18 16:51:26 +00:00
|
|
|
---
|
|
|
|
|
|
|
|
The `Value` data type is essentially an S-Expression, able to
|
|
|
|
represent semi-structured data over `ByteString`, `String`,
|
|
|
|
`SignedInteger` atoms and so on.[^why-not-spki-sexps]
|
|
|
|
|
|
|
|
[^why-not-spki-sexps]: Rivest's S-Expressions are in many ways
|
|
|
|
similar to Preserves. However, while they include binary data and
|
|
|
|
sequences, and an obvious equivalence for them exists, they lack
|
|
|
|
numbers *per se* as well as any kind of unordered structure such
|
|
|
|
as sets or maps. In addition, while “display hints” allow
|
|
|
|
labelling of binary data with an intended interpretation, they
|
|
|
|
cannot be attached to any other kind of structure, and the “hint”
|
|
|
|
itself can only be a binary blob.
|
|
|
|
|
|
|
|
However, users need a wide variety of data types for representing
|
|
|
|
domain-specific values such as various kinds of encoded and normalized
|
|
|
|
text, calendrical values, machine words, and so on.
|
|
|
|
|
|
|
|
Appropriately-labelled `Record`s denote these domain-specific data
|
|
|
|
types.[^why-dictionaries]
|
|
|
|
|
|
|
|
[^why-dictionaries]: Given `Record`'s existence, it may seem odd
|
|
|
|
that `Dictionary`, `Set`, `Float`, etc. are given special
|
|
|
|
treatment. Preserves aims to offer a useful basic equivalence
|
|
|
|
predicate to programmers, and so if a data type demands a special
|
|
|
|
equivalence predicate, as `Dictionary`, `Set` and `Float` all do,
|
|
|
|
then the type should be included in the base language. Otherwise,
|
|
|
|
it can be represented as a `Record` and treated separately.
|
|
|
|
`Boolean`, `String` and `Symbol` are seeming exceptions. The first
|
|
|
|
two merit inclusion because of their cultural importance, while
|
|
|
|
`Symbol`s are included to allow their use as `Record` labels.
|
|
|
|
Primitive `Symbol` support avoids a bootstrapping issue.
|
|
|
|
|
|
|
|
All of these conventions are optional. They form a layer atop the core
|
|
|
|
`Value` structure. Non-domain-specific tools do not in general need to
|
|
|
|
treat them specially.
|
|
|
|
|
|
|
|
**Validity.** Many of the labels we will describe in this section come
|
|
|
|
with side-conditions on the contents of labelled `Record`s. It is
|
|
|
|
possible to construct an instance of `Value` that violates these
|
|
|
|
side-conditions without ceasing to be a `Value` or becoming
|
|
|
|
unrepresentable. However, we say that such a `Value` is *invalid*
|
|
|
|
because it fails to honour the necessary side-conditions.
|
|
|
|
Implementations *SHOULD* allow two modes of working: one which
|
|
|
|
treats all `Value`s identically, without regard for side-conditions,
|
|
|
|
and one which enforces validity (i.e. side-conditions) when reading,
|
|
|
|
writing, or constructing `Value`s.
|
|
|
|
|
|
|
|
## IOLists.
|
|
|
|
|
|
|
|
Inspired by Erlang's notions of
|
|
|
|
[`iolist()` and `iodata()`](http://erlang.org/doc/reference_manual/typespec.html),
|
|
|
|
an `IOList` is any tree constructed from `ByteString`s and
|
|
|
|
`Sequence`s. Formally, an `IOList` is either a `ByteString` or a
|
|
|
|
`Sequence` of `IOList`s.
|
|
|
|
|
|
|
|
`IOList`s can be useful for
|
|
|
|
[vectored I/O](https://en.wikipedia.org/wiki/Vectored_I/O).
|
|
|
|
Additionally, the flexibility of `IOList` trees allows annotation of
|
|
|
|
interior portions of a tree.
|
|
|
|
|
|
|
|
## Comments.
|
|
|
|
|
|
|
|
`String` values used as annotations are conventionally interpreted as
|
|
|
|
comments.
|
|
|
|
|
|
|
|
@"I am a comment for the Dictionary"
|
|
|
|
{
|
|
|
|
@"I am a comment for the key"
|
|
|
|
key: @"I am a comment for the value"
|
|
|
|
value
|
|
|
|
}
|
|
|
|
|
|
|
|
@"I am a comment for this entire IOList"
|
|
|
|
[
|
|
|
|
#hex{00010203}
|
|
|
|
@"I am a comment for the middle half of the IOList"
|
|
|
|
@"A second comment for the same portion of the IOList"
|
|
|
|
[
|
|
|
|
@"I am a comment for the following ByteString"
|
|
|
|
#hex{04050607}
|
|
|
|
#hex{08090A0B}
|
|
|
|
]
|
|
|
|
#hex{0C0D0E0F}
|
|
|
|
]
|
|
|
|
|
|
|
|
## MIME-type tagged binary data.
|
|
|
|
|
|
|
|
Many internet protocols use
|
|
|
|
[media types](https://tools.ietf.org/html/rfc6838) (a.k.a MIME types)
|
|
|
|
to indicate the format of some associated binary data. For this
|
|
|
|
purpose, we define `MIMEData` to be a record labelled `mime` with two
|
|
|
|
fields, the first being a `Symbol`, the media type, and the second
|
|
|
|
being a `ByteString`, the binary data.
|
|
|
|
|
|
|
|
While each media type may define its own rules for comparing
|
|
|
|
documents, we define ordering among `MIMEData` *representations* of
|
|
|
|
such media types following the general rules for ordering of
|
|
|
|
`Record`s.
|
|
|
|
|
|
|
|
**Examples.**
|
|
|
|
|
|
|
|
| Value | Encoded hexadecimal byte sequence |
|
|
|
|
|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
|
|
|
|
| `<mime application/octet-stream #"abcde">` | 83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
|
|
|
|
| `<mime text/plain #"ABC">` | 83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 |
|
|
|
|
| `<mime application/xml #"<xhtml/>">` | 83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E |
|
|
|
|
| `<mime text/csv #"123,234,345">` | 83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 |
|
|
|
|
|
|
|
|
Applications making heavy use of `mime` records may choose to use a
|
|
|
|
placeholder number for the symbol `mime` as well as the symbols for
|
|
|
|
individual media types. For example, if placeholder number 1 were
|
|
|
|
chosen for `mime`, and placeholder number 7 for `text/plain`, the
|
|
|
|
second example above, `<mime text/plain #"ABC">`, would be encoded as
|
|
|
|
`83 11 17 63 41 42 43`.
|
|
|
|
|
|
|
|
## Unicode normalization forms.
|
|
|
|
|
|
|
|
Unicode defines multiple
|
|
|
|
[normalization forms](http://unicode.org/reports/tr15/) for text.
|
|
|
|
While no particular normalization form is required for `String`s,
|
|
|
|
users may need to unambiguously signal or require a particular
|
|
|
|
normalization form. A `NormalizedString` is a `Record` labelled with
|
|
|
|
`unicode-normalization` and having two fields, the first of which is a
|
|
|
|
`Symbol` specifying the normalization form used (e.g. `nfc`, `nfd`,
|
|
|
|
`nfkc`, `nfkd`), and the second of which is a `String` whose
|
|
|
|
underlying code point representation *MUST* be normalized according to
|
|
|
|
the named normalization form.
|
|
|
|
|
|
|
|
## IRIs (URIs, URLs, URNs, etc.).
|
|
|
|
|
|
|
|
An `IRI` is a `Record` labelled with `iri` and having one field, a
|
|
|
|
`String` which is the IRI itself and which *MUST* be a valid absolute
|
|
|
|
or relative IRI.
|
|
|
|
|
|
|
|
## Machine words.
|
|
|
|
|
|
|
|
The definition of `SignedInteger` captures all integers. However, in
|
|
|
|
certain circumstances it can be valuable to assert that a number
|
|
|
|
inhabits a particular range, such as a fixed-width machine word.
|
|
|
|
|
|
|
|
A family of labels `i`*n* and `u`*n* for *n* ∈ {8,16,32,64} denote
|
|
|
|
*n*-bit-wide signed and unsigned range restrictions, respectively.
|
|
|
|
Records with these labels *MUST* have one field, a `SignedInteger`,
|
|
|
|
which *MUST* fall within the appropriate range. That is, to be valid,
|
|
|
|
- in `<i8 `*x*`>`, -128 <= *x* <= 127.
|
|
|
|
- in `<u8 `*x*`>`, 0 <= *x* <= 255.
|
|
|
|
- in `<i16 `*x*`>`, -32768 <= *x* <= 32767.
|
|
|
|
- etc.
|
|
|
|
|
|
|
|
## Anonymous Tuples and Unit.
|
|
|
|
|
|
|
|
A `Tuple` is a `Record` with label `tuple` and zero or more fields,
|
|
|
|
denoting an anonymous tuple of values.
|
|
|
|
|
|
|
|
The 0-ary tuple, `<tuple>`, denotes the empty tuple, sometimes called
|
|
|
|
“unit” or “void” (but *not* e.g. JavaScript's “undefined” value).
|
|
|
|
|
|
|
|
## Null and Undefined.
|
|
|
|
|
|
|
|
Tony Hoare's
|
|
|
|
“[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)”
|
|
|
|
can be represented with the 0-ary `Record` `<null>`. An “undefined”
|
|
|
|
value can be represented as `<undefined>`.
|
|
|
|
|
|
|
|
## Dates and Times.
|
|
|
|
|
|
|
|
Dates, times, moments, and timestamps can be represented with a
|
|
|
|
`Record` with label `rfc3339` having a single field, a `String`, which
|
|
|
|
*MUST* conform to one of the `full-date`, `partial-time`, `full-time`,
|
|
|
|
or `date-time` productions of
|
|
|
|
[section 5.6 of RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6).
|
|
|
|
|
|
|
|
<!-- Heading to visually offset the footnotes from the main document: -->
|
|
|
|
## Notes
|