--- no_site_title: true title: "Preserves Schema" --- Tony Garnock-Jones June 2021. Version 0.1.2. [abnf]: https://tools.ietf.org/html/rfc7405 This document proposes a Schema language for the [Preserves data model](./preserves.html). ## Introduction A Preserves schema connects Preserves `Value`s to host-language data structures. Each definition within a schema can be processed by a compiler to produce - a host-language *type definition*; - a partial *parsing* function from `Value`s to instances of the produced type; and - a total *serialization* function from instances of the type to `Value`s. Every parsed `Value` retains enough information to always be able to be serialized again, and every instance of a host-language data structure contains, by construction, enough information to be successfully serialized. **Example.** Sending the schema version 1 . Date = . Person = . to the TypeScript schema compiler produces types, type Date = {"year": number, "month": number, "day": number}; type Person = {"name": string, "birthday": Date}; constructors, function Date({year, month, day}: {year: number, month: number, day: number}): Date; function Person({name, birthday}: {name: string, birthday: Date}): Person; partial parsing functions which throw on parse failure, function asDate(v: _val): Date; function asPerson(v: _val): Person; total parsing functions which yield `undefined` on parse failure, function toDate(v: _val): undefined | Date; function toPerson(v: _val): undefined | Person; and total serialization functions, function fromDate(_v: Date): _val; function fromPerson(_v: Person): _val; ## Concepts **Bundle.** A collection of schemas, each named by a module path. **Definition.** A named pattern within a schema. When compiled, a definition will usually produce a type (plus associated constructors and predicates), a parser function, and a serializer function. **Metaschema.** The Preserves metaschema is a schema describing the abstract syntax of all schema instances (including itself). **Module path.** A sequence of symbols, denoting a leaf in a tree with symbol-labelled edges. **Pattern.** A pattern describes a collection of `Value`s as well as providing names for the portions of matching `Value`s that should be captured in a host-language data type. **Schema abstract syntax tree (AST).** Schema-manipulating tools will usually work with schema AST; that is, with `Value`s conforming to the metaschema or instances of the corresponding host-language datastructures. **Schema domain-specific language (DSL).** While human beings *can* work directly with Preserves documents matching the metaschema, the schema DSL provides an easier-to-read and -write language for working with schemas that can be translated into instances **Schema.** A collection of definitions, plus an optional schema-wide reference to a schema describing embedded values. ## Concrete (DSL) Syntax In this section, we use an [ABNF][abnf]-like notation to define a textual syntax that is easy for people to read and write. Most of the examples in this document are written using this syntax. In the following section, we will define the abstract syntax that this surface syntax translates into. ### Schema files and bundles. Each schema should be placed in a single file. Schema files usually end with extension `.prs`, and consist of a sequence of Preserves `Value`s[^like-sexps] separated into *clauses* by the Preserves `Symbol` "`.`". [^like-sexps]: That is, schema files use Preserves as a kind of S-expression! A bundle of schema files is a directory tree containing `.prs` files. ### Clauses. Clause = (Version / EmbeddedTypeName / Include / Definition) "." Version = "version" "1" EmbeddedTypeName = "embeddedType" ("#f" / Ref) Include = "include" string Definition = id "=" (OrPattern / AndPattern / Pattern) **Version specification.** Mandatory. Names the version of the schema language used in the file. This version of the specification is referred to in schema files as `version 1`. **Embedded type name.** Optional. If given as `#f` (the default), it declares that values parsed by the schema do not contain embedded `Value`s of any particular type. If given as a `Ref`, a reference to a definition in this or a neighbouring schema, it declares that embedded `Value`s must themselves conform to the named definition. **Include.** *Experimental.* Includes the contents of a neighbouring file as if it were textually inserted in place of this clause. The file path may be relative to the current file, or absolute. **Definition.** Each definition clause implicitly connects a pattern with a type name and a set of associated functions. ### Union definitions. OrPattern = AltPattern "/" AltPattern *("/" AltPattern) The right-hand-side of a definition may supply two or more *alternatives*. When parsing, the alternatives are tried in order; the result of the first successful alternative is the result of the entire parse. The type corresponding to an `OrPattern` is a union type, a variant type, or an algebraic sum type, depending on the host language. Each alternative with an `OrPattern` must have a definition-unique *name*. The name can either be given explicitly as `@name` (see discussion of `NamedPattern` below) or inferred. It can only be inferred from the label of a record pattern, from the name of a reference to another definition, or from the text of a "sufficiently identifierlike" literal pattern - one that matches a string, symbol, number or boolean: AltPattern = "@" id SimplePattern / "<" id *(NamedPattern) ">" / Ref / LiteralPattern -- with a side condition ### Intersection definitions. AndPattern = NamedPattern "&" NamedPattern *("&" NamedPattern) The right-hand-side of a definition may supply two or more patterns, the *intersection* of whose denotations is the denotation of the overall definition. When parsing, every pattern is tried; if all succeed, the resulting information is combined into a single record type. When serializing, the terms resulting from serializing at each pattern are *merged* together. #### Experimental. Intersections are an experimental feature. They can be used to express *optional dictionary entries*:[^not-ideal-optional-encoding] MyDict = {a: int, b: string} & @c MaybeC . MaybeC = @present {c: symbol} / @absent {} . It is not yet clear whether they pull their weight. In particular, the semantics of serializing a value defined by intersection are not completely clear. [^not-ideal-optional-encoding]: This encoding is not ideal. Given a term `{a: 1, b: "", c: sym}`, it will yield a `present` result, and given `{a: 1, b: ""}`, will yield an `absent` result; but `absent` will also result from `{a: 1, b: "", c: "notasymbol"}`. ### Patterns. Pattern = SimplePattern / CompoundPattern Patterns come in two kinds: - the parsers for *simple patterns* yield a single host-language value; for example, a string, an array, a pointer, and so on. - the parsers for *compound patterns* yield zero or more *fields* which combine into an overall record type associated with a definition. #### Simple patterns SimplePattern = AnyPattern / AtomKindPattern / EmbeddedPattern / LiteralPattern / SequenceOfPattern / SetOfPattern / DictOfPattern / Ref The `any` pattern matches any input `Value`: AnyPattern = "any" Specifying the name of a kind of `Atom` matches that kind of atom: AtomKindPattern = "bool" / "float" / "double" / "int" / "string" / "bytes" / "symbol" Embedded input `Value`s are matched with embedded patterns. The portion under the `#!` prefix is the *interface* schema for the embedded value.[^interface-schema] The result of a match is an instance of the schema-wide `embeddedType`, if one is supplied. EmbeddedPattern = "#!" SimplePattern A literal pattern may be expressed in any of three ways: non-symbol atoms stand for themselves directly; symbols, prefixed with an equal sign, are matched literally; and any `Value` at all may be quoted by placing it in a `< ... >` record: LiteralPattern = "="symbol / "<" value ">" / non-symbol-atom Brackets containing an item pattern and a literal ellipsis match a sequence of items, each matching the nested item pattern. Sets and uniform dictionaries are similar. SequenceOfPattern = "[" SimplePattern "..." "]" SetOfPattern = "#{" SimplePattern "}" DictOfPattern = "{" SimplePattern ":" SimplePattern "...:..." "}" Finally, a reference to some other definition, in this schema or a neighbouring schema within this bundle, is made by mentioning the possibly-qualified name of the definition as a bare symbol: Ref = symbol Periods "`.`" in such symbols are special: - `Name` refers to the definition named `Name` in the current schema. - `Mod.Submod.Name` refers to definition `Name` in `Mod.Submod`, some other schema in the bundle. Each period-separated portion of a reference name must be an `id`, an identifier. [^interface-schema]: Embedded patterns are experimental. One interpretation is that an embedded value denotes a reference to some stateful actor in a potentially-distributed system, and that the interface schema associated with an embedded value describes the messages that may be sent to that actor. **Examples.** `#!any` may denote a reference to an Actor able to receive any value as a message; `#!#t`, a reference to an Actor expecting *only* the "true" message; `#!Session`, a reference to an Actor expecting any message matching a schema defined as `Session` in this file. #### Compound patterns CompoundPattern = RecordPattern / TuplePattern / VariableTuplePattern / DictionaryPattern A record pattern matches an input record. It may be specified as a record with a literal in the label position, or as a quoted `< ... >` record: RecordPattern = "<" NamedPattern *(NamedPattern) ">" / "<" value *(NamedPattern) ">" A tuple pattern matches a fixed-length sequence with specific patterns in each position. A variable tuple pattern is the same, but with an additional pattern for matching additional elements following the fixed-position patterns. TuplePattern = "[" *(NamedPattern) "]" VariableTuplePattern = "[" 1*(NamedPattern) NamedSimplePattern "..." "]" A dictionary pattern matches specific literal keys in an input dictionary. If no explicit name is given for a particular `NamedSimplePattern`, but the key for the pattern is a symbol, then that symbol is used as the name for that dictionary entry. DictionaryPattern = "{" *(value ":" NamedSimplePattern) "}" ### Identifiers and Bindings: NamedPattern and NamedSimplePattern Compound patterns specifications contain `NamedPattern`s or `NamedSimplePattern`s rather than ordinary `Pattern`s: NamedPattern = "@" id SimplePattern / Pattern NamedSimplePattern = "@" id SimplePattern / SimplePattern Use of an `@name` prefix generally results in creation of a field with the given name in the overall record type for a definition. The type of value contained in the field will correspond to the `Pattern` or `SimplePattern` given. An `id` is a symbol that matches the regular expression `^[a-zA-Z][a-zA-Z_0-9]*$`. This is a lowest-common-denominator constraint that allows for a reasonable mapping to the identifiers of many programming languages. ## Appendix: Metaschema The metaschema defines the structure of the abstract syntax (AST) of schemas, using the concrete DSL syntax described above. The text below is taken from [`schema/schema.prs`](https://gitlab.com/preserves/preserves/-/blob/main/schema/schema.prs) in the source code repository. A `Bundle` collects a number of `Schema`s, each named by a `ModulePath`:[^todo-semantics-of-bundles] Bundle = . Modules = { ModulePath: Schema ...:... }. ModulePath = [symbol ...]. Schema = . A `Version` names the version of the schema language in use. At present, it must be `1`. ; version 1 . Version = 1 . An `EmbeddedTypeName` specifies the type of embedded values within values parsed by a given schema: EmbeddedTypeName = Ref / #f. Ref = . The `Definitions` are a named collection of definitions within a schema. Note the special mention of `pattern0` and `pattern1`: these ensure that each `or` or `and` record has at least two members. Definitions = { symbol: Definition ...:... }. Definition = ; Pattern / Pattern / ... / ; Pattern & Pattern & ... / ; Pattern / Pattern . NamedAlternative = [@variantLabel string @pattern Pattern]. Each `Pattern` is either a simple or compound pattern: Pattern = SimplePattern / CompoundPattern . Simple patterns are as described above: SimplePattern = ; any / =any ; special builtins: bool, float, double, int, string, bytes, symbol / ; matches an embedded value in the input: #!p / ; =symbol, < any>, or plain non-symbol atom / ; [p ...] ----> >; see also tuplePrefix below. / ; #{p} ----> > / ; {k: v, ...:...} ----> > / ; symbol, symbol.symbol, symbol.symbol.symbol, ... / Ref . AtomKind = =Boolean / =Float / =Double / =SignedInteger / =String / =ByteString / =Symbol . Compound patterns involve optionally-named subpatterns: CompoundPattern = ;