preserves/conventions.md

---
title: "Conventions for Common Data Types"
---

The `Value` data type is essentially an S-Expression, able to
represent semi-structured data over `ByteString`, `String`,
`SignedInteger` atoms and so on.[^why-not-spki-sexps]

  [^why-not-spki-sexps]: Rivest's S-Expressions are in many ways
    similar to Preserves. However, while they include binary data and
    sequences, and an obvious equivalence for them exists, they lack
    numbers *per se* as well as any kind of unordered structure such
    as sets or maps. In addition, while “display hints” allow
    labelling of binary data with an intended interpretation, they
    cannot be attached to any other kind of structure, and the “hint”
    itself can only be a binary blob.

However, users need a wide variety of data types for representing
domain-specific values such as various kinds of encoded and normalized
text, calendrical values, machine words, and so on.

Appropriately-labelled `Record`s denote these domain-specific data
types.[^why-dictionaries]

  [^why-dictionaries]: Given `Record`'s existence, it may seem odd
    that `Dictionary`, `Set`, `Float`, etc. are given special
    treatment. Preserves aims to offer a useful basic equivalence
    predicate to programmers, and so if a data type demands a special
    equivalence predicate, as `Dictionary`, `Set` and `Float` all do,
    then the type should be included in the base language. Otherwise,
    it can be represented as a `Record` and treated separately.
    `Boolean`, `String` and `Symbol` are seeming exceptions. The first
    two merit inclusion because of their cultural importance, while
    `Symbol`s are included to allow their use as `Record` labels.
    Primitive `Symbol` support avoids a bootstrapping issue.

All of these conventions are optional. They form a layer atop the core
`Value` structure. Non-domain-specific tools do not in general need to
treat them specially.

**Validity.** Many of the labels we will describe in this section come
  with side-conditions on the contents of labelled `Record`s. It is
  possible to construct an instance of `Value` that violates these
  side-conditions without ceasing to be a `Value` or becoming
  unrepresentable. However, we say that such a `Value` is *invalid*
  because it fails to honour the necessary side-conditions.
  Implementations *SHOULD* allow two modes of working: one which
  treats all `Value`s identically, without regard for side-conditions,
  and one which enforces validity (i.e. side-conditions) when reading,
  writing, or constructing `Value`s.

## IOLists.

Inspired by Erlang's notions of
[`iolist()` and `iodata()`](http://erlang.org/doc/reference_manual/typespec.html),
an `IOList` is any tree constructed from `ByteString`s and
`Sequence`s. Formally, an `IOList` is either a `ByteString` or a
`Sequence` of `IOList`s.

`IOList`s can be useful for
[vectored I/O](https://en.wikipedia.org/wiki/Vectored_I/O).
Additionally, the flexibility of `IOList` trees allows annotation of
interior portions of a tree.

## Comments.

`String` values used as annotations are conventionally interpreted as
comments.

    @"I am a comment for the Dictionary"
    {
      @"I am a comment for the key"
      key: @"I am a comment for the value"
           value
    }

    @"I am a comment for this entire IOList"
    [
      #hex{00010203}
      @"I am a comment for the middle half of the IOList"
      @"A second comment for the same portion of the IOList"
      [
        @"I am a comment for the following ByteString"
        #hex{04050607}
        #hex{08090A0B}
      ]
      #hex{0C0D0E0F}
    ]

## MIME-type tagged binary data.

Many internet protocols use
[media types](https://tools.ietf.org/html/rfc6838) (a.k.a MIME types)
to indicate the format of some associated binary data. For this
purpose, we define `MIMEData` to be a record labelled `mime` with two
fields, the first being a `Symbol`, the media type, and the second
being a `ByteString`, the binary data.

While each media type may define its own rules for comparing
documents, we define ordering among `MIMEData` *representations* of
such media types following the general rules for ordering of
`Record`s.

**Examples.**

| Value                                      | Encoded hexadecimal byte sequence                                                                                 |
|--------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
| `<mime application/octet-stream #"abcde">` | 83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
| `<mime text/plain #"ABC">`                 | 83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43                                                    |
| `<mime application/xml #"<xhtml/>">`       | 83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E                   |
| `<mime text/csv #"123,234,345">`           | 83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35                                  |

Applications making heavy use of `mime` records may choose to use a
placeholder number for the symbol `mime` as well as the symbols for
individual media types. For example, if placeholder number 1 were
chosen for `mime`, and placeholder number 7 for `text/plain`, the
second example above, `<mime text/plain #"ABC">`, would be encoded as
`83 11 17 63 41 42 43`.

## Unicode normalization forms.

Unicode defines multiple
[normalization forms](http://unicode.org/reports/tr15/) for text.
While no particular normalization form is required for `String`s,
users may need to unambiguously signal or require a particular
normalization form. A `NormalizedString` is a `Record` labelled with
`unicode-normalization` and having two fields, the first of which is a
`Symbol` specifying the normalization form used (e.g. `nfc`, `nfd`,
`nfkc`, `nfkd`), and the second of which is a `String` whose
underlying code point representation *MUST* be normalized according to
the named normalization form.

## IRIs (URIs, URLs, URNs, etc.).

An `IRI` is a `Record` labelled with `iri` and having one field, a
`String` which is the IRI itself and which *MUST* be a valid absolute
or relative IRI.

## Machine words.

The definition of `SignedInteger` captures all integers. However, in
certain circumstances it can be valuable to assert that a number
inhabits a particular range, such as a fixed-width machine word.

A family of labels `i`*n* and `u`*n* for *n* ∈ {8,16,32,64} denote
*n*-bit-wide signed and unsigned range restrictions, respectively.
Records with these labels *MUST* have one field, a `SignedInteger`,
which *MUST* fall within the appropriate range. That is, to be valid,
 - in `<i8 `*x*`>`, -128 <= *x* <= 127.
 - in `<u8 `*x*`>`, 0 <= *x* <= 255.
 - in `<i16 `*x*`>`, -32768 <= *x* <= 32767.
 - etc.

## Anonymous Tuples and Unit.

A `Tuple` is a `Record` with label `tuple` and zero or more fields,
denoting an anonymous tuple of values.

The 0-ary tuple, `<tuple>`, denotes the empty tuple, sometimes called
“unit” or “void” (but *not* e.g. JavaScript's “undefined” value).

## Null and Undefined.

Tony Hoare's
“[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)”
can be represented with the 0-ary `Record` `<null>`. An “undefined”
value can be represented as `<undefined>`.

## Dates and Times.

Dates, times, moments, and timestamps can be represented with a
`Record` with label `rfc3339` having a single field, a `String`, which
*MUST* conform to one of the `full-date`, `partial-time`, `full-time`,
or `date-time` productions of
[section 5.6 of RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6).

<!-- Heading to visually offset the footnotes from the main document: -->
## Notes
Split out inessential text from the spec 2019-08-18 16:51:26 +00:00			`---`
Proper layouting 2019-08-18 21:08:55 +00:00			`title: "Conventions for Common Data Types"`
Split out inessential text from the spec 2019-08-18 16:51:26 +00:00			`---`

			The `Value` data type is essentially an S-Expression, able to
			represent semi-structured data over `ByteString`, `String`,
			`SignedInteger` atoms and so on.[^why-not-spki-sexps]

			`[^why-not-spki-sexps]: Rivest's S-Expressions are in many ways`
			`similar to Preserves. However, while they include binary data and`
			`sequences, and an obvious equivalence for them exists, they lack`
			`numbers per se as well as any kind of unordered structure such`
			`as sets or maps. In addition, while “display hints” allow`
			`labelling of binary data with an intended interpretation, they`
			`cannot be attached to any other kind of structure, and the “hint”`
			`itself can only be a binary blob.`

			`However, users need a wide variety of data types for representing`
			`domain-specific values such as various kinds of encoded and normalized`
			`text, calendrical values, machine words, and so on.`

			Appropriately-labelled `Record`s denote these domain-specific data
			`types.[^why-dictionaries]`

			[^why-dictionaries]: Given `Record`'s existence, it may seem odd
			that `Dictionary`, `Set`, `Float`, etc. are given special
			`treatment. Preserves aims to offer a useful basic equivalence`
			`predicate to programmers, and so if a data type demands a special`
			equivalence predicate, as `Dictionary`, `Set` and `Float` all do,
			`then the type should be included in the base language. Otherwise,`
			it can be represented as a `Record` and treated separately.
			`Boolean`, `String` and `Symbol` are seeming exceptions. The first
			`two merit inclusion because of their cultural importance, while`
			`Symbol`s are included to allow their use as `Record` labels.
			Primitive `Symbol` support avoids a bootstrapping issue.

			`All of these conventions are optional. They form a layer atop the core`
			`Value` structure. Non-domain-specific tools do not in general need to
			`treat them specially.`

			`Validity. Many of the labels we will describe in this section come`
			with side-conditions on the contents of labelled `Record`s. It is
			possible to construct an instance of `Value` that violates these
			side-conditions without ceasing to be a `Value` or becoming
			unrepresentable. However, we say that such a `Value` is invalid
			`because it fails to honour the necessary side-conditions.`
			`Implementations SHOULD allow two modes of working: one which`
			treats all `Value`s identically, without regard for side-conditions,
			`and one which enforces validity (i.e. side-conditions) when reading,`
			writing, or constructing `Value`s.

			`## IOLists.`

			`Inspired by Erlang's notions of`
			[`iolist()` and `iodata()`](http://erlang.org/doc/reference_manual/typespec.html),
			an `IOList` is any tree constructed from `ByteString`s and
			`Sequence`s. Formally, an `IOList` is either a `ByteString` or a
			`Sequence` of `IOList`s.

			`IOList`s can be useful for
			`[vectored I/O](https://en.wikipedia.org/wiki/Vectored_I/O).`
			Additionally, the flexibility of `IOList` trees allows annotation of
			`interior portions of a tree.`

			`## Comments.`

			`String` values used as annotations are conventionally interpreted as
			`comments.`

			`@"I am a comment for the Dictionary"`
			`{`
			`@"I am a comment for the key"`
			`key: @"I am a comment for the value"`
			`value`
			`}`

			`@"I am a comment for this entire IOList"`
			`[`
			`#hex{00010203}`
			`@"I am a comment for the middle half of the IOList"`
			`@"A second comment for the same portion of the IOList"`
			`[`
			`@"I am a comment for the following ByteString"`
			`#hex{04050607}`
			`#hex{08090A0B}`
			`]`
			`#hex{0C0D0E0F}`
			`]`

			`## MIME-type tagged binary data.`

			`Many internet protocols use`
			`[media types](https://tools.ietf.org/html/rfc6838) (a.k.a MIME types)`
			`to indicate the format of some associated binary data. For this`
			purpose, we define `MIMEData` to be a record labelled `mime` with two
			fields, the first being a `Symbol`, the media type, and the second
			being a `ByteString`, the binary data.

			`While each media type may define its own rules for comparing`
			documents, we define ordering among `MIMEData` representations of
			`such media types following the general rules for ordering of`
			`Record`s.

			`Examples.`

			`\| Value \| Encoded hexadecimal byte sequence \|`
			`\|--------------------------------------------\|-------------------------------------------------------------------------------------------------------------------\|`
			\| `<mime application/octet-stream #"abcde">` \| 83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 \|
			\| `<mime text/plain #"ABC">` \| 83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 \|
			\| `<mime application/xml #"<xhtml/>">` \| 83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E \|
			\| `<mime text/csv #"123,234,345">` \| 83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 \|

			Applications making heavy use of `mime` records may choose to use a
			placeholder number for the symbol `mime` as well as the symbols for
			`individual media types. For example, if placeholder number 1 were`
			chosen for `mime`, and placeholder number 7 for `text/plain`, the
			second example above, `<mime text/plain #"ABC">`, would be encoded as
			`83 11 17 63 41 42 43`.

			`## Unicode normalization forms.`

			`Unicode defines multiple`
			`[normalization forms](http://unicode.org/reports/tr15/) for text.`
			While no particular normalization form is required for `String`s,
			`users may need to unambiguously signal or require a particular`
			normalization form. A `NormalizedString` is a `Record` labelled with
			`unicode-normalization` and having two fields, the first of which is a
			`Symbol` specifying the normalization form used (e.g. `nfc`, `nfd`,
			`nfkc`, `nfkd`), and the second of which is a `String` whose
			`underlying code point representation MUST be normalized according to`
			`the named normalization form.`

			`## IRIs (URIs, URLs, URNs, etc.).`

			An `IRI` is a `Record` labelled with `iri` and having one field, a
			`String` which is the IRI itself and which MUST be a valid absolute
			`or relative IRI.`

			`## Machine words.`

			The definition of `SignedInteger` captures all integers. However, in
			`certain circumstances it can be valuable to assert that a number`
			`inhabits a particular range, such as a fixed-width machine word.`

			A family of labels `i`n and `u`n for n ∈ {8,16,32,64} denote
			`n-bit-wide signed and unsigned range restrictions, respectively.`
			Records with these labels MUST have one field, a `SignedInteger`,
			`which MUST fall within the appropriate range. That is, to be valid,`
			- in `<i8 `x`>`, -128 <= x <= 127.
			- in `<u8 `x`>`, 0 <= x <= 255.
			- in `<i16 `x`>`, -32768 <= x <= 32767.
			`- etc.`

			`## Anonymous Tuples and Unit.`

			A `Tuple` is a `Record` with label `tuple` and zero or more fields,
			`denoting an anonymous tuple of values.`

			The 0-ary tuple, `<tuple>`, denotes the empty tuple, sometimes called
			`“unit” or “void” (but not e.g. JavaScript's “undefined” value).`

			`## Null and Undefined.`

			`Tony Hoare's`
			`“[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)”`
			can be represented with the 0-ary `Record` `<null>`. An “undefined”
			value can be represented as `<undefined>`.

			`## Dates and Times.`

			`Dates, times, moments, and timestamps can be represented with a`
			`Record` with label `rfc3339` having a single field, a `String`, which
			MUST conform to one of the `full-date`, `partial-time`, `full-time`,
			or `date-time` productions of
			`[section 5.6 of RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6).`

			`<!-- Heading to visually offset the footnotes from the main document: -->`
			`## Notes`