2019-08-18 16:51:26 +00:00
|
|
|
---
|
2019-08-18 21:08:55 +00:00
|
|
|
title: "Conventions for Common Data Types"
|
2019-08-18 16:51:26 +00:00
|
|
|
---
|
|
|
|
|
|
|
|
The `Value` data type is essentially an S-Expression, able to
|
|
|
|
represent semi-structured data over `ByteString`, `String`,
|
|
|
|
`SignedInteger` atoms and so on.[^why-not-spki-sexps]
|
|
|
|
|
|
|
|
[^why-not-spki-sexps]: Rivest's S-Expressions are in many ways
|
|
|
|
similar to Preserves. However, while they include binary data and
|
|
|
|
sequences, and an obvious equivalence for them exists, they lack
|
|
|
|
numbers *per se* as well as any kind of unordered structure such
|
|
|
|
as sets or maps. In addition, while “display hints” allow
|
|
|
|
labelling of binary data with an intended interpretation, they
|
|
|
|
cannot be attached to any other kind of structure, and the “hint”
|
|
|
|
itself can only be a binary blob.
|
|
|
|
|
|
|
|
However, users need a wide variety of data types for representing
|
|
|
|
domain-specific values such as various kinds of encoded and normalized
|
|
|
|
text, calendrical values, machine words, and so on.
|
|
|
|
|
|
|
|
Appropriately-labelled `Record`s denote these domain-specific data
|
|
|
|
types.[^why-dictionaries]
|
|
|
|
|
|
|
|
[^why-dictionaries]: Given `Record`'s existence, it may seem odd
|
|
|
|
that `Dictionary`, `Set`, `Float`, etc. are given special
|
|
|
|
treatment. Preserves aims to offer a useful basic equivalence
|
|
|
|
predicate to programmers, and so if a data type demands a special
|
|
|
|
equivalence predicate, as `Dictionary`, `Set` and `Float` all do,
|
|
|
|
then the type should be included in the base language. Otherwise,
|
|
|
|
it can be represented as a `Record` and treated separately.
|
|
|
|
`Boolean`, `String` and `Symbol` are seeming exceptions. The first
|
|
|
|
two merit inclusion because of their cultural importance, while
|
|
|
|
`Symbol`s are included to allow their use as `Record` labels.
|
|
|
|
Primitive `Symbol` support avoids a bootstrapping issue.
|
|
|
|
|
|
|
|
All of these conventions are optional. They form a layer atop the core
|
|
|
|
`Value` structure. Non-domain-specific tools do not in general need to
|
|
|
|
treat them specially.
|
|
|
|
|
|
|
|
**Validity.** Many of the labels we will describe in this section come
|
|
|
|
with side-conditions on the contents of labelled `Record`s. It is
|
|
|
|
possible to construct an instance of `Value` that violates these
|
|
|
|
side-conditions without ceasing to be a `Value` or becoming
|
|
|
|
unrepresentable. However, we say that such a `Value` is *invalid*
|
|
|
|
because it fails to honour the necessary side-conditions.
|
|
|
|
Implementations *SHOULD* allow two modes of working: one which
|
|
|
|
treats all `Value`s identically, without regard for side-conditions,
|
|
|
|
and one which enforces validity (i.e. side-conditions) when reading,
|
|
|
|
writing, or constructing `Value`s.
|
|
|
|
|
|
|
|
## IOLists.
|
|
|
|
|
|
|
|
Inspired by Erlang's notions of
|
|
|
|
[`iolist()` and `iodata()`](http://erlang.org/doc/reference_manual/typespec.html),
|
|
|
|
an `IOList` is any tree constructed from `ByteString`s and
|
|
|
|
`Sequence`s. Formally, an `IOList` is either a `ByteString` or a
|
|
|
|
`Sequence` of `IOList`s.
|
|
|
|
|
|
|
|
`IOList`s can be useful for
|
|
|
|
[vectored I/O](https://en.wikipedia.org/wiki/Vectored_I/O).
|
|
|
|
Additionally, the flexibility of `IOList` trees allows annotation of
|
|
|
|
interior portions of a tree.
|
|
|
|
|
|
|
|
## Comments.
|
|
|
|
|
|
|
|
`String` values used as annotations are conventionally interpreted as
|
2020-12-28 22:25:02 +00:00
|
|
|
comments. Special syntax exists for such string annotations, though
|
|
|
|
the usual `@`-prefixed annotation notation can also be used.
|
2019-08-18 16:51:26 +00:00
|
|
|
|
2020-12-28 22:25:02 +00:00
|
|
|
;I am a comment for the Dictionary
|
2019-08-18 16:51:26 +00:00
|
|
|
{
|
2020-12-28 22:25:02 +00:00
|
|
|
;I am a comment for the key
|
|
|
|
key: ;I am a comment for the value
|
2019-08-18 16:51:26 +00:00
|
|
|
value
|
|
|
|
}
|
|
|
|
|
2020-12-28 22:25:02 +00:00
|
|
|
;I am a comment for this entire IOList
|
2019-08-18 16:51:26 +00:00
|
|
|
[
|
2020-12-28 22:25:02 +00:00
|
|
|
#x"00010203"
|
|
|
|
;I am a comment for the middle half of the IOList
|
|
|
|
;A second comment for the same portion of the IOList
|
|
|
|
@ ;I am the first and only comment for the following comment
|
2020-05-13 10:50:12 +00:00
|
|
|
"A third (itself commented!) comment for the same part of the IOList"
|
2019-08-18 16:51:26 +00:00
|
|
|
[
|
2020-12-28 22:25:02 +00:00
|
|
|
;"I am a comment for the following ByteString"
|
|
|
|
#x"04050607"
|
|
|
|
#x"08090A0B"
|
2019-08-18 16:51:26 +00:00
|
|
|
]
|
2020-12-28 22:25:02 +00:00
|
|
|
#x"0C0D0E0F"
|
2019-08-18 16:51:26 +00:00
|
|
|
]
|
|
|
|
|
|
|
|
## MIME-type tagged binary data.
|
|
|
|
|
|
|
|
Many internet protocols use
|
|
|
|
[media types](https://tools.ietf.org/html/rfc6838) (a.k.a MIME types)
|
|
|
|
to indicate the format of some associated binary data. For this
|
|
|
|
purpose, we define `MIMEData` to be a record labelled `mime` with two
|
|
|
|
fields, the first being a `Symbol`, the media type, and the second
|
|
|
|
being a `ByteString`, the binary data.
|
|
|
|
|
|
|
|
While each media type may define its own rules for comparing
|
|
|
|
documents, we define ordering among `MIMEData` *representations* of
|
|
|
|
such media types following the general rules for ordering of
|
|
|
|
`Record`s.
|
|
|
|
|
|
|
|
**Examples.**
|
|
|
|
|
2020-12-28 22:25:02 +00:00
|
|
|
«<mime application/octet-stream #"abcde">»
|
|
|
|
= B4 B3 04 "mime" B3 18 "application/octet-stream" B2 05 "abcde"
|
|
|
|
|
|
|
|
«<mime text/plain #"ABC">»
|
|
|
|
= B4 B3 04 "mime" B3 0A "text/plain" B2 03 "ABC" 84
|
|
|
|
|
|
|
|
«<mime application/xml #"<xhtml/>">»
|
|
|
|
= B4 B3 04 "mime" B3 0F "application/xml" B2 08 "<xhtml/>" 84
|
|
|
|
|
|
|
|
«<mime text/csv #"123,234,345">»
|
|
|
|
= B4 B3 04 "mime" B3 08 "text/csv" B2 0B "123,234,345" 84
|
2019-08-18 16:51:26 +00:00
|
|
|
|
|
|
|
## Unicode normalization forms.
|
|
|
|
|
|
|
|
Unicode defines multiple
|
|
|
|
[normalization forms](http://unicode.org/reports/tr15/) for text.
|
|
|
|
While no particular normalization form is required for `String`s,
|
|
|
|
users may need to unambiguously signal or require a particular
|
|
|
|
normalization form. A `NormalizedString` is a `Record` labelled with
|
|
|
|
`unicode-normalization` and having two fields, the first of which is a
|
|
|
|
`Symbol` specifying the normalization form used (e.g. `nfc`, `nfd`,
|
|
|
|
`nfkc`, `nfkd`), and the second of which is a `String` whose
|
|
|
|
underlying code point representation *MUST* be normalized according to
|
|
|
|
the named normalization form.
|
|
|
|
|
|
|
|
## IRIs (URIs, URLs, URNs, etc.).
|
|
|
|
|
|
|
|
An `IRI` is a `Record` labelled with `iri` and having one field, a
|
|
|
|
`String` which is the IRI itself and which *MUST* be a valid absolute
|
|
|
|
or relative IRI.
|
|
|
|
|
|
|
|
## Machine words.
|
|
|
|
|
|
|
|
The definition of `SignedInteger` captures all integers. However, in
|
|
|
|
certain circumstances it can be valuable to assert that a number
|
|
|
|
inhabits a particular range, such as a fixed-width machine word.
|
|
|
|
|
2020-05-29 08:00:11 +00:00
|
|
|
A family of labels `i`*n* and `u`*n* for *n* ∈ {8,16,32,64,128} denote
|
2019-08-18 16:51:26 +00:00
|
|
|
*n*-bit-wide signed and unsigned range restrictions, respectively.
|
|
|
|
Records with these labels *MUST* have one field, a `SignedInteger`,
|
|
|
|
which *MUST* fall within the appropriate range. That is, to be valid,
|
|
|
|
- in `<i8 `*x*`>`, -128 <= *x* <= 127.
|
|
|
|
- in `<u8 `*x*`>`, 0 <= *x* <= 255.
|
|
|
|
- in `<i16 `*x*`>`, -32768 <= *x* <= 32767.
|
|
|
|
- etc.
|
|
|
|
|
|
|
|
## Anonymous Tuples and Unit.
|
|
|
|
|
|
|
|
A `Tuple` is a `Record` with label `tuple` and zero or more fields,
|
|
|
|
denoting an anonymous tuple of values.
|
|
|
|
|
|
|
|
The 0-ary tuple, `<tuple>`, denotes the empty tuple, sometimes called
|
|
|
|
“unit” or “void” (but *not* e.g. JavaScript's “undefined” value).
|
|
|
|
|
|
|
|
## Null and Undefined.
|
|
|
|
|
|
|
|
Tony Hoare's
|
|
|
|
“[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)”
|
|
|
|
can be represented with the 0-ary `Record` `<null>`. An “undefined”
|
|
|
|
value can be represented as `<undefined>`.
|
|
|
|
|
|
|
|
## Dates and Times.
|
|
|
|
|
|
|
|
Dates, times, moments, and timestamps can be represented with a
|
|
|
|
`Record` with label `rfc3339` having a single field, a `String`, which
|
|
|
|
*MUST* conform to one of the `full-date`, `partial-time`, `full-time`,
|
|
|
|
or `date-time` productions of
|
|
|
|
[section 5.6 of RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6).
|
|
|
|
|
|
|
|
<!-- Heading to visually offset the footnotes from the main document: -->
|
|
|
|
## Notes
|