15 KiB
no_site_title | title |
---|---|
true | Preserves: an Expressive Data Language |
Tony Garnock-Jones tonyg@leastfixedpoint.com
{{ site.version_date }}. Version {{ site.version }}.
{% include what-is-preserves.md %}
This document defines the core semantics and data model of Preserves and presents a handful of examples. Two other core documents define
for the Preserves data model.
Values
Preserves values are given meaning independent of their syntax. We
will write "Value
" when we mean the set of all Preserves values or an
element of that set.
Value
s fall into two broad categories: atomic and compound
data. Every Value
is finite and non-cyclic. Embedded values, called
Embedded
s, are a third, special-case category.
{% include value-grammar.md %}
Total order. As we go, we will
incrementally specify a total order over Value
s. Two values of the
same kind are compared using kind-specific rules. The ordering among
values of different kinds is essentially arbitrary, but having a total
order is convenient for many tasks, so we define it as
follows:
(Values) Atom < Compound < Embedded
(Compounds) Record < Sequence < Set < Dictionary
(Atoms) Boolean < Float < Double < SignedInteger
< String < ByteString < Symbol
Equivalence. Two Value
s are equal if
neither is less than the other according to the total order.
Signed integers.
A SignedInteger
is an arbitrarily-large signed integer.
SignedInteger
s are compared as mathematical integers.
Unicode strings.
A String
is a sequence of Unicode
code-points.1
String
s are compared lexicographically, code-point by
code-point.2
Binary data.
A ByteString
is a sequence of octets. ByteString
s are compared
lexicographically.
Symbols.
Programming languages like Lisp and Prolog frequently use string-like
values called symbols. Here, a Symbol
is, like a String
, a
sequence of Unicode code-points representing an identifier of some
kind. Symbol
s are also compared lexicographically by code-point.
Booleans.
There are two Boolean
s, “false” and “true”. The “false” value is
less-than the “true” value.
IEEE floating-point values.
Float
s and Double
s are single- and double-precision IEEE 754
floating-point values, respectively. Float
s, Double
s and
SignedInteger
s are disjoint; by the rules above, every
Float
is less than every Double
, and every SignedInteger
is
greater than both. Two Float
s or two Double
s are to be ordered by
the totalOrder
predicate defined in section 5.10 of IEEE Std
754-2008.
Records.
A Record
is a labelled tuple of Value
s, the record's fields. A
label can be any Value
, but is usually a Symbol
.3
4 Record
s are compared lexicographically: first by
label, then by field sequence.
Sequences.
A Sequence
is a sequence of Value
s. Sequence
s are compared
lexicographically.
Sets.
A Set
is an unordered finite set of Value
s. It contains no
duplicate values, following the equivalence relation
induced by the total order on Value
s. Two Set
s are compared by
sorting their elements ascending using the total order
and comparing the resulting Sequence
s.
Dictionaries.
A Dictionary
is an unordered finite collection of pairs of Value
s.
Each pair comprises a key and a value. Keys in a Dictionary
are
pairwise distinct. Instances of Dictionary
are compared by
lexicographic comparison of the sequences resulting from ordering each
Dictionary
's pairs in ascending order by key.
Embeddeds.
An Embedded
allows inclusion of domain-specific, potentially
stateful or located data into a Value
.5
Embedded
s may be used to denote stateful objects, network services,
object capabilities, file descriptors, Unix processes, or other
possibly-stateful things. Because each Embedded
is a domain-specific
datum, comparison of two Embedded
s is done according to
domain-specific rules.
Motivating Examples. In a Java or Python implementation, an Embedded
may
denote a reference to a Java or Python object; comparison would be
done via the language's own rules for equivalence and ordering. In a
Unix application, an Embedded
may denote an open file descriptor or
a process ID. In an HTTP-based application, each Embedded
might be a
URL, compared according to
RFC 6943. When a
Value
is serialized for storage or transfer, Embedded
s will
usually be represented as ordinary Value
s, in which case the
ordinary rules for comparing Value
s will apply.
Examples
The definitions above are independent of any particular concrete syntax.
The examples of Value
s that follow are written using the Preserves
text syntax, and the example encoded byte
sequences use the Preserves binary encoding.
Ordering.
The total ordering specified above means that the following statements are true:
"bzz" < "c" < "caa" < #!"a"
#t < 3.0f < 3.0 < 3 < "3" < |3| < [] < #!#t
Simple examples.
Value | Encoded byte sequence |
---|---|
<capture <discard>> |
B4 B3 07 'c' 'a' 'p' 't' 'u' 'r' 'e' B4 B3 07 'd' 'i' 's' 'c' 'a' 'r' 'd' 84 84 |
[1 2 3 4] |
B5 91 92 93 94 84 |
[-2 -1 0 1] |
B5 9E 9F 90 91 84 |
"hello" (format B) |
B1 05 'h' 'e' 'l' 'l' 'o' |
["a" b #"c" [] #{} #t #f] |
B5 B1 01 'a' B3 01 'b' B2 01 'c' B5 84 B6 84 81 80 84 |
-257 |
A1 FE FF |
-1 |
9F |
0 |
90 |
1 |
91 |
255 |
A1 00 FF |
1.0f |
82 3F 80 00 00 |
1.0 |
83 3F F0 00 00 00 00 00 00 |
-1.202e300 |
83 FE 3C B7 B7 59 BF 04 26 |
#xf"7f800000" , positive Float infinity |
82 7F 80 00 00 |
#xd"fff0000000000000" , negative Double infinity |
83 FF F0 00 00 00 00 00 00 |
The next example uses a non-Symbol
label for a record.6 The Record
<[titled person 2 thing 1] 101 "Blackwell" <date 1821 2 3> "Dr">
encodes to
B4 ;; Record
B5 ;; Sequence
B3 06 74 69 74 6C 65 64 ;; Symbol, "titled"
B3 06 70 65 72 73 6F 6E ;; Symbol, "person"
92 ;; SignedInteger, "2"
B3 05 74 68 69 6E 67 ;; Symbol, "thing"
91 ;; SignedInteger, "1"
84 ;; End (sequence)
A0 65 ;; SignedInteger, "101"
B1 09 42 6C 61 63 6B 77 65 6C 6C ;; String, "Blackwell"
B4 ;; Record
B3 04 64 61 74 65 ;; Symbol, "date"
A1 07 1D ;; SignedInteger, "1821"
92 ;; SignedInteger, "2"
93 ;; SignedInteger, "3"
84 ;; End (record)
B1 02 44 72 ;; String, "Dr"
84 ;; End (record)
JSON examples.
Preserves text syntax is a superset of JSON, so the examples from RFC 8259 read as valid Preserves.
The JSON literals true
, false
and null
all read as Symbol
s, and
JSON numbers read (unambiguously) either as SignedInteger
s or as
Double
s.7
The first RFC 8259 example:
{
"Image": {
"Width": 800,
"Height": 600,
"Title": "View from 15th Floor",
"Thumbnail": {
"Url": "http://www.example.com/image/481989943",
"Height": 125,
"Width": 100
},
"Animated" : false,
"IDs": [116, 943, 234, 38793]
}
}
when read using the Preserves text syntax encodes via the binary syntax as follows:
B7
B1 05 "Image"
B7
B1 03 "IDs" B5
A0 74
A1 03 AF
A1 00 EA
A2 00 97 89
84
B1 05 "Title" B1 14 "View from 15th Floor"
B1 05 "Width" A1 03 20
B1 06 "Height" A1 02 58
B1 08 "Animated" B3 05 "false"
B1 09 "Thumbnail"
B7
B1 03 "Url" B1 26 "http://www.example.com/image/481989943"
B1 05 "Width" A0 64
B1 06 "Height" A0 7D
84
84
84
The second RFC 8259 example:
[
{
"precision": "zip",
"Latitude": 37.7668,
"Longitude": -122.3959,
"Address": "",
"City": "SAN FRANCISCO",
"State": "CA",
"Zip": "94107",
"Country": "US"
},
{
"precision": "zip",
"Latitude": 37.371991,
"Longitude": -122.026020,
"Address": "",
"City": "SUNNYVALE",
"State": "CA",
"Zip": "94085",
"Country": "US"
}
]
encodes to binary as follows:
B5
B7
B1 03 "Zip" B1 05 "94107"
B1 04 "City" B1 0D "SAN FRANCISCO"
B1 05 "State" B1 02 "CA"
B1 07 "Address" B1 00
B1 07 "Country" B1 02 "US"
B1 08 "Latitude" 83 40 42 E2 26 80 9D 49 52
B1 09 "Longitude" 83 C0 5E 99 56 6C F4 1F 21
B1 09 "precision" B1 03 "zip"
84
B7
B1 03 "Zip" B1 05 "94085"
B1 04 "City" B1 09 "SUNNYVALE"
B1 05 "State" B1 02 "CA"
B1 07 "Address" B1 00
B1 07 "Country" B1 02 "US"
B1 08 "Latitude" 83 40 42 AF 9D 66 AD B4 03
B1 09 "Longitude" 83 C0 5E 81 AA 4F CA 42 AF
B1 09 "precision" B1 03 "zip"
84
84
Notes
-
All Unicode code-points are permitted, including NUL (code point zero). ↩︎
-
Happily, the design of UTF-8 is such that this gives the same result as a lexicographic byte-by-byte comparison of the UTF-8 encoding of a string! ↩︎
-
The Racket programming language defines “prefab” structure types, which map well to our
Record
s. Racket supports record extensibility by encoding record supertypes into record labels as specially-formatted lists. ↩︎ -
It is occasionally (but seldom) necessary to interpret such
Symbol
labels as UTF-8 encoded IRIs. Where a label can be read as a relative IRI, it is notionally interpreted with respect to the IRIurn:uuid:6bf094a6-20f1-4887-ada7-46834a9b5b34
; where a label can be read as an absolute IRI, it stands for that IRI; and otherwise, it cannot be read as an IRI at all, and so the label simply stands for itself—for its ownValue
. ↩︎ -
Rationale. Why include
Embedded
s as a special class, distinct from, say, a specially-labeledRecord
? First, aRecord
can only hold otherValue
s: in order to embed values such as live pointers to Java objects, some means of "escaping" from theValue
data type must be provided. Second,Embedded
s are meant to be able to denote stateful entities, for which comparison by address is appropriate; however, we do not wish to place restrictions on the nature of these entities: if we had usedRecord
s instead of distinctEmbedded
s, users would have to invent an encoding of domain data intoRecord
s that reflected domain ordering intoValue
ordering. This is often difficult and may not always be possible. Finally, becauseEmbedded
s are intended to be able to represent network and memory locations, they must be able to be rewritten at network and process boundaries. Having a distinct class allows genericEmbedded
rewriting without the quotation-related complications of encoding references as, say,Record
s. ↩︎ -
It happens to line up with Racket's representation of a record label for an inheritance hierarchy where
titled
extendsperson
extendsthing
:(struct date (year month day) #:prefab) (struct thing (id) #:prefab) (struct person thing (name date-of-birth) #:prefab) (struct titled person (title) #:prefab)
For more detail on Racket's representations of record labels, see the Racket documentation for
make-prefab-struct
. ↩︎ -
The following schema definitions match exactly the JSON subset of a Preserves input:
↩︎version 1 . JSON = @string string / @integer int / @double double / @boolean JSONBoolean / @null =null / @array [JSON ...] / @object { string: JSON ...:... } . JSONBoolean = =true / =false .