7.8 KiB
title |
---|
Conventions for Common Data Types |
The Value
data type is essentially an S-Expression, able to
represent semi-structured data over ByteString
, String
,
SignedInteger
atoms and so on.1
However, users need a wide variety of data types for representing domain-specific values such as various kinds of encoded and normalized text, calendrical values, machine words, and so on.
Appropriately-labelled Record
s denote these domain-specific data
types.2
All of these conventions are optional. They form a layer atop the core
Value
structure. Non-domain-specific tools do not in general need to
treat them specially.
Validity. Many of the labels we will describe in this section come
with side-conditions on the contents of labelled Record
s. It is
possible to construct an instance of Value
that violates these
side-conditions without ceasing to be a Value
or becoming
unrepresentable. However, we say that such a Value
is invalid
because it fails to honour the necessary side-conditions.
Implementations SHOULD allow two modes of working: one which
treats all Value
s identically, without regard for side-conditions,
and one which enforces validity (i.e. side-conditions) when reading,
writing, or constructing Value
s.
IOLists.
Inspired by Erlang's notions of
iolist()
and iodata()
,
an IOList
is any tree constructed from ByteString
s and
Sequence
s. Formally, an IOList
is either a ByteString
or a
Sequence
of IOList
s.
IOList
s can be useful for
vectored I/O.
Additionally, the flexibility of IOList
trees allows annotation of
interior portions of a tree.
Comments.
String
values used as annotations are conventionally interpreted as
comments.
@"I am a comment for the Dictionary"
{
@"I am a comment for the key"
key: @"I am a comment for the value"
value
}
@"I am a comment for this entire IOList"
[
#hex{00010203}
@"I am a comment for the middle half of the IOList"
@"A second comment for the same portion of the IOList"
[
@"I am a comment for the following ByteString"
#hex{04050607}
#hex{08090A0B}
]
#hex{0C0D0E0F}
]
MIME-type tagged binary data.
Many internet protocols use
media types (a.k.a MIME types)
to indicate the format of some associated binary data. For this
purpose, we define MIMEData
to be a record labelled mime
with two
fields, the first being a Symbol
, the media type, and the second
being a ByteString
, the binary data.
While each media type may define its own rules for comparing
documents, we define ordering among MIMEData
representations of
such media types following the general rules for ordering of
Record
s.
Examples.
Value | Encoded hexadecimal byte sequence |
---|---|
<mime application/octet-stream #"abcde"> |
83 74 6D 69 6D 65 7F 18 61 70 70 6C 69 63 61 74 69 6F 6E 2F 6F 63 74 65 74 2D 73 74 72 65 61 6D 65 61 62 63 64 65 |
<mime text/plain #"ABC"> |
83 74 6D 69 6D 65 7A 74 65 78 74 2F 70 6C 61 69 6E 63 41 42 43 |
<mime application/xml #"<xhtml/>"> |
83 74 6D 69 6D 65 7F 0F 61 70 70 6C 69 63 61 74 69 6F 6E 2F 78 6D 6C 68 3C 78 68 74 6D 6C 2F 3E |
<mime text/csv #"123,234,345"> |
83 74 6D 69 6D 65 78 74 65 78 74 2F 63 73 76 6B 31 32 33 2C 32 33 34 2C 33 34 35 |
Applications making heavy use of mime
records may choose to use a
placeholder number for the symbol mime
as well as the symbols for
individual media types. For example, if placeholder number 1 were
chosen for mime
, and placeholder number 7 for text/plain
, the
second example above, <mime text/plain #"ABC">
, would be encoded as
83 11 17 63 41 42 43
.
Unicode normalization forms.
Unicode defines multiple
normalization forms for text.
While no particular normalization form is required for String
s,
users may need to unambiguously signal or require a particular
normalization form. A NormalizedString
is a Record
labelled with
unicode-normalization
and having two fields, the first of which is a
Symbol
specifying the normalization form used (e.g. nfc
, nfd
,
nfkc
, nfkd
), and the second of which is a String
whose
underlying code point representation MUST be normalized according to
the named normalization form.
IRIs (URIs, URLs, URNs, etc.).
An IRI
is a Record
labelled with iri
and having one field, a
String
which is the IRI itself and which MUST be a valid absolute
or relative IRI.
Machine words.
The definition of SignedInteger
captures all integers. However, in
certain circumstances it can be valuable to assert that a number
inhabits a particular range, such as a fixed-width machine word.
A family of labels i
n and u
n for n ∈ {8,16,32,64} denote
n-bit-wide signed and unsigned range restrictions, respectively.
Records with these labels MUST have one field, a SignedInteger
,
which MUST fall within the appropriate range. That is, to be valid,
- in
<i8
x>
, -128 <= x <= 127. - in
<u8
x>
, 0 <= x <= 255. - in
<i16
x>
, -32768 <= x <= 32767. - etc.
Anonymous Tuples and Unit.
A Tuple
is a Record
with label tuple
and zero or more fields,
denoting an anonymous tuple of values.
The 0-ary tuple, <tuple>
, denotes the empty tuple, sometimes called
“unit” or “void” (but not e.g. JavaScript's “undefined” value).
Null and Undefined.
Tony Hoare's
“billion-dollar mistake”
can be represented with the 0-ary Record
<null>
. An “undefined”
value can be represented as <undefined>
.
Dates and Times.
Dates, times, moments, and timestamps can be represented with a
Record
with label rfc3339
having a single field, a String
, which
MUST conform to one of the full-date
, partial-time
, full-time
,
or date-time
productions of
section 5.6 of RFC 3339.
Notes
-
Rivest's S-Expressions are in many ways similar to Preserves. However, while they include binary data and sequences, and an obvious equivalence for them exists, they lack numbers per se as well as any kind of unordered structure such as sets or maps. In addition, while “display hints” allow labelling of binary data with an intended interpretation, they cannot be attached to any other kind of structure, and the “hint” itself can only be a binary blob. ↩︎
-
Given
Record
's existence, it may seem odd thatDictionary
,Set
,Float
, etc. are given special treatment. Preserves aims to offer a useful basic equivalence predicate to programmers, and so if a data type demands a special equivalence predicate, asDictionary
,Set
andFloat
all do, then the type should be included in the base language. Otherwise, it can be represented as aRecord
and treated separately.Boolean
,String
andSymbol
are seeming exceptions. The first two merit inclusion because of their cultural importance, whileSymbol
s are included to allow their use asRecord
labels. PrimitiveSymbol
support avoids a bootstrapping issue. ↩︎