--- title: "Conventions for Common Data Types" --- The `Value` data type is essentially an S-Expression, able to represent semi-structured data over `ByteString`, `String`, `SignedInteger` atoms and so on.[^why-not-spki-sexps] [^why-not-spki-sexps]: Rivest's S-Expressions are in many ways similar to Preserves. However, while they include binary data and sequences, and an obvious equivalence for them exists, they lack numbers *per se* as well as any kind of unordered structure such as sets or maps. In addition, while “display hints” allow labelling of binary data with an intended interpretation, they cannot be attached to any other kind of structure, and the “hint” itself can only be a binary blob. However, users need a wide variety of data types for representing domain-specific values such as various kinds of encoded and normalized text, calendrical values, machine words, and so on. Appropriately-labelled `Record`s denote these domain-specific data types.[^why-dictionaries] [^why-dictionaries]: Given `Record`'s existence, it may seem odd that `Dictionary`, `Set`, `Float`, etc. are given special treatment. Preserves aims to offer a useful basic equivalence predicate to programmers, and so if a data type demands a special equivalence predicate, as `Dictionary`, `Set` and `Float` all do, then the type should be included in the base language. Otherwise, it can be represented as a `Record` and treated separately. `Boolean`, `String` and `Symbol` are seeming exceptions. The first two merit inclusion because of their cultural importance, while `Symbol`s are included to allow their use as `Record` labels. Primitive `Symbol` support avoids a bootstrapping issue. All of these conventions are optional. They form a layer atop the core `Value` structure. Non-domain-specific tools do not in general need to treat them specially. **Validity.** Many of the labels we will describe in this section come with side-conditions on the contents of labelled `Record`s. It is possible to construct an instance of `Value` that violates these side-conditions without ceasing to be a `Value` or becoming unrepresentable. However, we say that such a `Value` is *invalid* because it fails to honour the necessary side-conditions. Implementations *SHOULD* allow two modes of working: one which treats all `Value`s identically, without regard for side-conditions, and one which enforces validity (i.e. side-conditions) when reading, writing, or constructing `Value`s. ## IOLists. Inspired by Erlang's notions of [`iolist()` and `iodata()`](http://erlang.org/doc/reference_manual/typespec.html), an `IOList` is any tree constructed from `ByteString`s and `Sequence`s. Formally, an `IOList` is either a `ByteString` or a `Sequence` of `IOList`s. `IOList`s can be useful for [vectored I/O](https://en.wikipedia.org/wiki/Vectored_I/O). Additionally, the flexibility of `IOList` trees allows annotation of interior portions of a tree. ## Comments. `String` values used as annotations are conventionally interpreted as comments. Special syntax exists for such string annotations, though the usual `@`-prefixed annotation notation can also be used. ;I am a comment for the Dictionary { ;I am a comment for the key key: ;I am a comment for the value value } ;I am a comment for this entire IOList [ #x"00010203" ;I am a comment for the middle half of the IOList ;A second comment for the same portion of the IOList @ ;I am the first and only comment for the following comment "A third (itself commented!) comment for the same part of the IOList" [ ;"I am a comment for the following ByteString" #x"04050607" #x"08090A0B" ] #x"0C0D0E0F" ] ## MIME-type tagged binary data. Many internet protocols use [media types](https://tools.ietf.org/html/rfc6838) (a.k.a MIME types) to indicate the format of some associated binary data. For this purpose, we define `MIMEData` to be a record labelled `mime` with two fields, the first being a `Symbol`, the media type, and the second being a `ByteString`, the binary data. While each media type may define its own rules for comparing documents, we define ordering among `MIMEData` *representations* of such media types following the general rules for ordering of `Record`s. **Examples.** «» = B4 B3 04 "mime" B3 18 "application/octet-stream" B2 05 "abcde" «» = B4 B3 04 "mime" B3 0A "text/plain" B2 03 "ABC" 84 «">» = B4 B3 04 "mime" B3 0F "application/xml" B2 08 "" 84 «» = B4 B3 04 "mime" B3 08 "text/csv" B2 0B "123,234,345" 84 ## Unicode normalization forms. Unicode defines multiple [normalization forms](http://unicode.org/reports/tr15/) for text. While no particular normalization form is required for `String`s, users may need to unambiguously signal or require a particular normalization form. A `NormalizedString` is a `Record` labelled with `unicode-normalization` and having two fields, the first of which is a `Symbol` specifying the normalization form used (e.g. `nfc`, `nfd`, `nfkc`, `nfkd`), and the second of which is a `String` whose underlying code point representation *MUST* be normalized according to the named normalization form. ## IRIs (URIs, URLs, URNs, etc.). An `IRI` is a `Record` labelled with `iri` and having one field, a `String` which is the IRI itself and which *MUST* be a valid absolute or relative IRI. ## Machine words. The definition of `SignedInteger` captures all integers. However, in certain circumstances it can be valuable to assert that a number inhabits a particular range, such as a fixed-width machine word. A family of labels `i`*n* and `u`*n* for *n* ∈ {8,16,32,64,128} denote *n*-bit-wide signed and unsigned range restrictions, respectively. Records with these labels *MUST* have one field, a `SignedInteger`, which *MUST* fall within the appropriate range. That is, to be valid, - in ``, -128 <= *x* <= 127. - in ``, 0 <= *x* <= 255. - in ``, -32768 <= *x* <= 32767. - etc. ## Anonymous Tuples and Unit. A `Tuple` is a `Record` with label `tuple` and zero or more fields, denoting an anonymous tuple of values. The 0-ary tuple, ``, denotes the empty tuple, sometimes called “unit” or “void” (but *not* e.g. JavaScript's “undefined” value). ## Null and Undefined. Tony Hoare's “[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions)” can be represented with the 0-ary `Record` ``. An “undefined” value can be represented as ``. ## Dates and Times. Dates, times, moments, and timestamps can be represented with a `Record` with label `rfc3339` having a single field, a `String`, which *MUST* conform to one of the `full-date`, `partial-time`, `full-time`, or `date-time` productions of [section 5.6 of RFC 3339](https://tools.ietf.org/html/rfc3339#section-5.6). (In `date-time`, "T" and "Z" *MUST* be upper-case and "T" *MUST* be used; a space separating the `full-date` and `full-time` *MUST NOT* be used.) ## XML Infoset [XML Infoset](https://www.w3.org/TR/2004/REC-xml-infoset-20040204/) describes the semantics of XML - that is, the underlying information contained in a document, independent of surface syntax. A useful subset of XML Infoset, namely its Element Information Items (omitting processing instructions, entities, entity references, comments, namespaces, name prefixes, and base URIs), can be captured with the [schema](preserves-schema.html) Node = Text / Element . Text = string . Element = / @withAttributes < @localName symbol [@attributes Attributes @children Node ...]> / @withoutAttributes < @localName symbol @children [Node ...]> . Attributes = { symbol: string ...:... } . **Examples.**

"?" >
>
>> > ## Notes