956 lines
35 KiB
Markdown
956 lines
35 KiB
Markdown
---
|
||
no_site_title: true
|
||
title: "Preserves Schema"
|
||
---
|
||
|
||
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
||
February 2023. Version 0.3.1.
|
||
|
||
[abnf]: https://tools.ietf.org/html/rfc7405
|
||
|
||
This document proposes a Schema language for the
|
||
[Preserves data model](./preserves.html).
|
||
|
||
## Introduction
|
||
|
||
{% include what-is-preserves-schema.md %}
|
||
|
||
**Portability.** Preserves Schema is broadly portable. Any host-language
|
||
type system that can represent [algebraic
|
||
types](https://en.wikipedia.org/wiki/Algebraic_data_type) in some way
|
||
should be suitable as a compilation target.
|
||
|
||
This includes ML-family languages like [Rust][rust-impl] and Haskell,
|
||
object-oriented languages like Java, [Python][python-impl] and
|
||
[Smalltalk][smalltalk-impl], and multiparadigm languages like
|
||
[JavaScript][ts-impl], [TypeScript][ts-impl], [Racket][racket-impl],
|
||
[Nim][nim-impl] and Erlang.
|
||
|
||
[nim-impl]: https://git.syndicate-lang.org/ehmry/preserves-nim
|
||
[python-impl]: https://gitlab.com/preserves/preserves/-/blob/main/implementations/python/preserves/schema.py
|
||
[racket-impl]: https://gitlab.com/preserves/preserves/-/tree/main/implementations/racket/preserves/preserves-schema
|
||
[rust-impl]: https://gitlab.com/preserves/preserves/-/tree/main/implementations/rust/preserves-schema
|
||
[smalltalk-impl]: https://squeaksource.com/Preserves.html
|
||
[ts-impl]: https://gitlab.com/preserves/preserves/-/tree/main/implementations/javascript/packages/schema
|
||
|
||
**Example.** Sending the schema
|
||
|
||
version 1 .
|
||
Date = <date @year int @month int @day int>.
|
||
Person = <person @name string @birthday Date>.
|
||
|
||
to the TypeScript schema compiler produces types,
|
||
|
||
type Date = {"year": number, "month": number, "day": number};
|
||
type Person = {"name": string, "birthday": Date};
|
||
|
||
constructors,
|
||
|
||
function Date({year, month, day}: {year: number, month: number, day: number}): Date;
|
||
function Person({name, birthday}: {name: string, birthday: Date}): Person;
|
||
|
||
partial parsing functions which throw on parse failure,
|
||
|
||
function asDate(v: _val): Date;
|
||
function asPerson(v: _val): Person;
|
||
|
||
total parsing functions which yield `undefined` on parse failure,
|
||
|
||
function toDate(v: _val): undefined | Date;
|
||
function toPerson(v: _val): undefined | Person;
|
||
|
||
and total serialization functions,
|
||
|
||
function fromDate(_v: Date): _val;
|
||
function fromPerson(_v: Person): _val;
|
||
|
||
## Concepts
|
||
|
||
**Bundle.** A collection of schemas, each named by a module path.
|
||
|
||
**Definition.** A named pattern within a schema. When compiled, a
|
||
definition will usually produce a type (plus associated constructors
|
||
and predicates), a parser function, and a serializer function.
|
||
|
||
**Metaschema.** The Preserves metaschema is a schema describing the
|
||
abstract syntax of all schema instances (including itself).
|
||
|
||
**Module path.** A sequence of symbols, denoting a leaf in a tree with
|
||
symbol-labelled edges.
|
||
|
||
**Pattern.** A pattern describes a collection of `Value`s as well as
|
||
providing names for the portions of matching `Value`s that should be
|
||
captured in a host-language data type.
|
||
|
||
**Schema abstract syntax tree (AST).** Schema-manipulating tools will
|
||
usually work with schema AST; that is, with `Value`s conforming to the
|
||
metaschema or instances of the corresponding host-language
|
||
datastructures.
|
||
|
||
**Schema domain-specific language (DSL).** While human beings *can*
|
||
work directly with Preserves documents matching the metaschema, the
|
||
schema DSL provides an easier-to-read and -write language for working
|
||
with schemas that can be translated into instances
|
||
|
||
**Schema.** A collection of definitions, plus an optional schema-wide
|
||
reference to a schema describing embedded values.
|
||
|
||
## Identifiers and Capitalization Conventions
|
||
|
||
Throughout, `id` is used in the grammar to denote an *identifier*,
|
||
which is a symbol that matches the regular expression
|
||
`^[a-zA-Z][a-zA-Z_0-9]*$`. This is a lowest-common-denominator
|
||
constraint that allows for a reasonable mapping to the identifiers of
|
||
many programming languages.
|
||
|
||
Identifiers are case-sensitive. Schemas should be written with an
|
||
awareness of the fact that some programming languages cannot preserve
|
||
case differences. Avoid using two identifiers in the same context that
|
||
differ only in case.
|
||
|
||
Schemas should be written using the following capitalization
|
||
conventions:
|
||
|
||
- `UpperCamelCase` for *definition* names.
|
||
|
||
- Either `lowerCamelCase` or `UpperCamelCase` for definition-unique
|
||
names for alternatives within an alternation definition.
|
||
|
||
- `lowerCamelCase` for *module* names (schema names, package names)
|
||
and *field* or *variable* names.
|
||
|
||
|
||
## The Preserves Schema Language
|
||
|
||
In this section, we use an [ABNF][abnf]-like notation to define a
|
||
textual syntax that is easy for people to read and write. Most of the
|
||
examples in this document are written using this syntax. An appendix
|
||
defines the abstract syntax that this surface syntax translates into.
|
||
|
||
### Schema files and bundles.
|
||
|
||
Each schema should be placed in a single file. Schema files usually
|
||
end with extension `.prs`, and consist of a sequence of Preserves
|
||
`Value`s[^like-sexps] separated into *clauses* by the Preserves
|
||
`Symbol` "`.`".
|
||
|
||
[^like-sexps]: That is, schema files use Preserves as a kind of
|
||
S-expression!
|
||
|
||
A bundle of schema files is a directory tree containing `.prs` files.
|
||
|
||
### Clauses.
|
||
|
||
Clause = (Version / EmbeddedTypeName / Include / Definition) "."
|
||
|
||
Version = "version" "1"
|
||
EmbeddedTypeName = "embeddedType" ("#f" / Ref)
|
||
Include = "include" string
|
||
|
||
**Version specification.** Mandatory. Names the version of the schema
|
||
language used in the file. This version of the specification is
|
||
referred to in schema files as `version 1`.
|
||
|
||
**Embedded type name.** Optional. If given as `#f` (the default), it
|
||
declares that values parsed by the schema do not contain embedded
|
||
`Value`s of any particular type. If given as a `Ref`, a reference to a
|
||
definition in this or a neighbouring schema, it declares that embedded
|
||
`Value`s must themselves conform to the named definition.
|
||
|
||
**Include.** *Experimental.* Includes the contents of a neighbouring
|
||
file as if it were textually inserted in place of this clause. The
|
||
file path may be relative to the current file, or absolute.
|
||
|
||
### Definitions.
|
||
|
||
Definition = id "=" (OrPattern / AndPattern / Pattern)
|
||
|
||
Each definition clause connects a pattern over `Value`s with a
|
||
host-language type name (derived from the supplied `id`) and set of
|
||
associated functions.
|
||
|
||
A definition may be
|
||
|
||
- an *alternation* of patterns, allowing for biased choice among alternatives;
|
||
- an *intersection* of patterns, allowing for composition and reuse of patterns; or
|
||
- the base case, an ordinary pattern.
|
||
|
||
**Host-language types.** Each definition includes *bindings* that
|
||
capture information from a parsed `Value` and expose it to programs in
|
||
the host language. When more than one binding is present in a
|
||
definition, a host-language record (product, structure, tuple) will be
|
||
the result of a parse; otherwise, a simple value will result. When a
|
||
definition involves *alternation*, a host-language representation of a
|
||
sum over the types of each branch of the alternation will result. For
|
||
example, a compiler targeting an object-oriented host language would
|
||
produce a base class for each definition, with a field for each binding
|
||
and a subclass for each variant alternative. A functional host language
|
||
with algebraic data types would produce a labelled-sum-of-products type.
|
||
|
||
### Alternation definitions.
|
||
|
||
OrPattern = AltPattern "/" AltPattern *("/" AltPattern)
|
||
|
||
The right-hand-side of a definition may supply two or more
|
||
*alternatives*. When parsing, the alternatives are tried in order; the
|
||
result of the first successful alternative is the result of the entire
|
||
parse.
|
||
|
||
**Host-language types.** The type corresponding to an `OrPattern` is an
|
||
algebraic sum type, a union type, a variant type, or a concrete subclass
|
||
of an abstract superclass, depending on the host language.
|
||
|
||
**Variant names.** Each alternative with an `OrPattern` must have a
|
||
definition-unique *name*. The name is used to uniquely label the
|
||
alternative's host-language representation (for example, a subclass, or
|
||
a member of a tagged union type).
|
||
|
||
A variant name can either be given explicitly as `@name` (see discussion
|
||
of `NamedPattern` below) or inferred. It can only be inferred from the
|
||
label of a record pattern, from the name of a reference to another
|
||
definition, or from the text of a "sufficiently identifierlike" literal
|
||
pattern - one that matches a string, symbol, number or boolean:
|
||
|
||
AltPattern = "@" id SimplePattern
|
||
/ "<" id PatternSequence ">"
|
||
/ Ref
|
||
/ LiteralPattern -- with a side condition
|
||
|
||
A host language will likely use the same ordering of its types as
|
||
specified by the schema. It is therefore recommended to specify first
|
||
the alternative best suited as a default initialization value (if
|
||
there is any).
|
||
|
||
### Intersection definitions.
|
||
|
||
AndPattern = NamedPattern "&" NamedPattern *("&" NamedPattern)
|
||
|
||
The right-hand-side of a definition may supply two or more patterns, the
|
||
*intersection* of whose denotations is the denotation of the overall
|
||
definition. When parsing, every pattern is tried: if all succeed, the
|
||
resulting information is combined into a single type; otherwise, the
|
||
overall parse fails.
|
||
|
||
When serializing, the terms resulting from serializing at each pattern
|
||
are *merged* together.
|
||
|
||
**Host-language types.** Compiling an intersection definition produces a
|
||
host-language type that is effectively the algebraic product of the
|
||
types of the parts of the intersection. Practically, this usually means
|
||
a record (product, structure, tuple) type.
|
||
|
||
{:.rationale}
|
||
> #### Experimental.
|
||
>
|
||
> Intersections are an experimental feature. They can be used to express
|
||
> *optional dictionary entries*:
|
||
>
|
||
> MyDict = {a: int, b: string} & @c MaybeC .
|
||
> MaybeC = @present {c: symbol} / @invalid {c: any} / @absent {} .
|
||
>
|
||
> They can also be used to express something reminiscent of *inheritance*:
|
||
>
|
||
> Type = @base BaseFields & @detail SubType .
|
||
> BaseFields = {a: int, b: string} .
|
||
> SubType = @base {}
|
||
> / @variantA { x: int }
|
||
> / @mid Mid .
|
||
> Mid = { y: symbol } & @detail SubSubType .
|
||
> SubSubType = @variantB { z: "type-b" }
|
||
> / @variantC { z: "type-c" }
|
||
>
|
||
> It is not yet clear whether they pull their weight.
|
||
>
|
||
> From the point of view of the user of the schema language, using
|
||
> intersections to express optional values is cumbersome. Not only is it
|
||
> verbose, requiring auxiliary definitions, but it leaves responsibility
|
||
> for checking for invalid inputs up to the user, rather than handling
|
||
> it completely at the Schema layer. A future Schema version will likely
|
||
> include first-class support for optionality.
|
||
|
||
### Patterns.
|
||
|
||
Pattern = SimplePattern / CompoundPattern
|
||
|
||
Patterns come in two kinds:
|
||
|
||
- The parsers for *simple patterns* yield a single host-language
|
||
value—for example, a string, an array, a number, or a pointer—or
|
||
even, in the case of `LiteralPattern`s, no host-language values at
|
||
all.[^no-values-at-all]
|
||
|
||
- The parsers for *compound patterns* yield zero or more *fields*
|
||
which combine into an overall record type associated with a
|
||
definition.
|
||
|
||
[^no-values-at-all]: The case of a `LiteralPattern` yielding no
|
||
host-language values is interesting. All the information required to
|
||
reversibly store the result of a parse is already in the schema, so
|
||
nothing need be stored at runtime in host-language data type
|
||
instances. Concretely, a definition consisting only of a
|
||
`LiteralPattern` might correspond to a host-language unit type (the
|
||
empty tuple, the "void" value). Definitions consisting of
|
||
`CompoundPattern`s involving `LiteralPattern`s do not even need to
|
||
store this much: fields of unit type in a host-language record type
|
||
can simply be omitted without loss.
|
||
|
||
#### Simple patterns
|
||
|
||
SimplePattern = AnyPattern
|
||
/ AtomKindPattern
|
||
/ EmbeddedPattern
|
||
/ LiteralPattern
|
||
/ SequenceOfPattern
|
||
/ SetOfPattern
|
||
/ DictOfPattern
|
||
/ Ref
|
||
|
||
The `any` pattern matches any input `Value`:
|
||
|
||
AnyPattern = "any"
|
||
|
||
Specifying the name of a kind of `Atom` matches that kind of atom:
|
||
|
||
AtomKindPattern = "bool" / "float" / "double" / "int" / "string" / "bytes" / "symbol"
|
||
|
||
Embedded input `Value`s are matched with embedded patterns. The
|
||
portion under the `#!` prefix is the *interface* schema for the
|
||
embedded value.[^interface-schema] The result of a match is an
|
||
instance of the schema-wide `embeddedType`, if one is supplied.
|
||
|
||
EmbeddedPattern = "#!" SimplePattern
|
||
|
||
A literal pattern may be expressed in any of three ways: non-symbol
|
||
atoms stand for themselves directly; symbols, prefixed with an equal
|
||
sign, are matched literally; and any `Value` at all may be quoted by
|
||
placing it in a `<<lit> ... >` record:
|
||
|
||
LiteralPattern = "="symbol / "<<lit>" value ">" / non-symbol-atom
|
||
|
||
Brackets containing an item pattern and a literal ellipsis match a
|
||
sequence of items, each matching the nested item pattern. Sets and
|
||
uniform dictionaries are similar.
|
||
|
||
SequenceOfPattern = "[" SimplePattern "..." "]"
|
||
SetOfPattern = "#{" SimplePattern "}"
|
||
DictOfPattern = "{" SimplePattern ":" SimplePattern "...:..." "}"
|
||
|
||
Finally, a reference to some other definition, in this schema or a
|
||
neighbouring schema within this bundle, is made by mentioning the
|
||
possibly-qualified name of the definition as a bare symbol:
|
||
|
||
Ref = symbol
|
||
|
||
Periods "`.`" in such symbols are special:
|
||
|
||
- `Name` refers to the definition named `Name` in the current schema.
|
||
- `Mod.Submod.Name` refers to definition `Name` in `Mod.Submod`, some other schema in the bundle.
|
||
|
||
Each period-separated portion of a reference name must be an `id`, an
|
||
identifier.
|
||
|
||
[^interface-schema]: Embedded patterns are experimental. One
|
||
interpretation is that an embedded value denotes a reference to
|
||
some stateful actor in a potentially-distributed system, and that
|
||
the interface schema associated with an embedded value describes
|
||
the messages that may be sent to that actor.
|
||
|
||
**Examples.** `#!any` may denote a reference to an Actor able to
|
||
receive any value as a message; `#!#t`, a reference to an Actor
|
||
expecting *only* the "true" message; `#!Session`, a reference to
|
||
an Actor expecting any message matching a schema defined as
|
||
`Session` in this file.
|
||
|
||
#### Compound patterns
|
||
|
||
CompoundPattern = RecordPattern
|
||
/ TuplePattern
|
||
/ VariableTuplePattern
|
||
/ DictionaryPattern
|
||
|
||
A record pattern matches an input record. It may be specified as a
|
||
record with a literal in the label position, or as a quoted `<<rec>
|
||
... >` record with a pattern for each of the label and field-sequence
|
||
positions:[^record-shorthand]
|
||
|
||
RecordPattern = "<<rec>" NamedPattern NamedPattern ">"
|
||
/ "<" value PatternSequence ">"
|
||
|
||
PatternSequence = *(NamedPattern) [NamedSimplePattern "..."]
|
||
|
||
[^record-shorthand]: Note that `<label `*ps*`>` can be thought of as
|
||
roughly equivalent to `<<rec> <<lit> label> [`*ps*`]>`. The
|
||
following two definitions are equivalent:
|
||
|
||
D1 = <foo @a string @b string @extra any ... >.
|
||
D2 = <<rec> <<lit> foo> [@a string @b string @extra any ...]>.
|
||
|
||
A tuple pattern matches a fixed-length sequence with specific patterns
|
||
in each position. A variable tuple pattern is the same, but with an
|
||
additional pattern for matching additional elements following the
|
||
fixed-position patterns.
|
||
|
||
TuplePattern = "[" *(NamedPattern) "]"
|
||
VariableTuplePattern = "[" *(NamedPattern) NamedSimplePattern "..." "]"
|
||
|
||
A dictionary pattern matches specific literal keys in an input
|
||
dictionary. If no explicit name is given for a particular
|
||
`NamedSimplePattern`, but the key for the pattern is a symbol, then
|
||
that symbol is used as the name for that dictionary entry.
|
||
|
||
DictionaryPattern = "{" *(value ":" NamedSimplePattern) "}"
|
||
|
||
### Identifiers and Bindings: NamedPattern and NamedSimplePattern
|
||
|
||
Compound patterns specifications contain `NamedPattern`s or
|
||
`NamedSimplePattern`s rather than ordinary `Pattern`s:
|
||
|
||
NamedPattern = "@" id SimplePattern / Pattern
|
||
NamedSimplePattern = "@" id SimplePattern / SimplePattern
|
||
|
||
Use of an `@name` prefix generally results in creation of a field with
|
||
the given name in the overall record type for a definition. The type
|
||
of value contained in the field will correspond to the `Pattern` or
|
||
`SimplePattern` given.
|
||
|
||
## Semantics
|
||
|
||
Having covered concrete syntax, we now give semantics for the schema
|
||
language in terms of the [abstract syntax][schema.prs] and of the
|
||
language of Preserves `Value`s.
|
||
|
||
[schema.prs]: https://gitlab.com/preserves/preserves/-/blob/main/schema/schema.prs
|
||
|
||
### Metaschema interpreter
|
||
|
||
(TODO: this subsection is to define an interpreter for metaschema values
|
||
applied to Preserves `Value`s.)
|
||
|
||
### Host-language types
|
||
|
||
The host-language types corresponding to a metaschema instance can
|
||
themselves be described according to a grammar.
|
||
|
||
The definitions in this section should be understood as being part of a
|
||
module named `host`, in a bundle alongside a module named `schema`
|
||
corresponding to the metaschema in the appendix below.
|
||
|
||
#### Abstract host language types
|
||
|
||
Definition = <union @variants [Variant ...]> / Simple .
|
||
Variant = [@label symbol @type Simple] .
|
||
|
||
The host-language type corresponding to a definition will either be a
|
||
tagged union (side condition: at least two `Variant`s are present in a
|
||
`union`) or a *simple* type.
|
||
|
||
Simple = Field / Record .
|
||
Record = <rec @fields [NamedField ...]> .
|
||
NamedField = [@name symbol @type Field] .
|
||
|
||
A *simple* type may be either a single, simple value of *field* type, or
|
||
a record of multiple named fields, each having a specific *field* type.
|
||
|
||
Field = =unit
|
||
/ =any
|
||
/ =embedded
|
||
/ <array @element Field>
|
||
/ <set @element Field>
|
||
/ <map @key Field @value Field>
|
||
/ <ref @name schema.Ref>
|
||
/ schema.AtomKind .
|
||
|
||
A *field* type is either
|
||
|
||
- the language's unit type (the empty tuple, the "void" value),
|
||
- the universal type of all Preserves `Value`s,
|
||
- the type of some host-language [embedded value](./preserves.html#embeddeds) in some context,
|
||
- the type of a uniform array having elements of a specific *field* type,
|
||
- the type of a set having elements of a specific *field* type,
|
||
- the type of a dictionary connecting keys of specific type to values of specific type,
|
||
- the type associated with some other named definition in scope in the current Schema bundle, or
|
||
- the type of a specific kind of Preserves [`Atom`](./preserves#values).
|
||
|
||
#### Computing abstract types from a metaschema instance
|
||
|
||
Given a metaschema definition *d* : `schema.Definition`, the function
|
||
**typeof**{:.pseudocode} yields a `host.Definition`.
|
||
|
||
{:.pseudocode #def:typeof}
|
||
> **typeof** : `schema.Definition` ⟶ `host.Definition`
|
||
> **typeof** `<or [[`*n`1`* *p`1`*`]` ... `[`*n`n`* *p`n`*`]]>` = `<union [[`*n`1`* (**pat** *p`1`*)`]` ... `[`*n`n`* (**pat** *p`n`*)`]]>`
|
||
> **typeof** `<and [`*f`1` ... f`n`*`]>` = **product** `[`*f`1` ... f`n`*`]`
|
||
> **typeof** *p* = **pat** *p*, when *p* ∈ `schema.Pattern`
|
||
|
||
{:.pseudocode #def:pat}
|
||
> **pat** : `schema.Pattern` ⟶ `host.Simple`
|
||
> **pat** *s* = **field** *s*, when *s* ∈ `schema.SimplePattern`
|
||
> **pat** *c* = **product** `[`*c*`]`, when *c* ∈ `schema.CompoundPattern`
|
||
|
||
{:.pseudocode #def:field}
|
||
> **field** : `schema.SimplePattern` ⟶ `host.Field`
|
||
> **field** `any` = `any`
|
||
> **field** `<atom` *k*`>` = *k*
|
||
> **field** `<embedded` *s*`>` = `embedded`
|
||
> **field** `<lit` *v*`>` = `unit`
|
||
> **field** `<seqof` *s*`>` = `<array` (**field** s)`>`
|
||
> **field** `<setof` *s*`>` = `<set` (**field** s)`>`
|
||
> **field** `<dictof` *s`k`* *s`v`*`>` = `<map` (**field** *s`k`*) (**field** *s`v`*)`>`
|
||
> **field** *r* = *r*, when *r* ∈ `schema.Ref`
|
||
|
||
The helper function **product**{:.pseudocode} is where `unit`-valued
|
||
fields are omitted from the computed host-language type. If all fields
|
||
are so omitted, or if there were (recursively) no bindings in the input
|
||
patterns, **product**{:.pseudocode} yields `unit` type itself.
|
||
|
||
{:.pseudocode #def:product}
|
||
> **product** : `[schema.NamedPattern` ...`]` ⟶ `host.Simple`
|
||
> **product** `[`*f`1` ... f`n`*`]` = `unit`, if *t* = `[]`;
|
||
> `<rec` *t*`>`, otherwise
|
||
> where *t* = **gather** *f`1`* ⧺ ⋯ ⧺ **gather** *f`n`*
|
||
|
||
{:.pseudocode #def:gather}
|
||
> **gather** : `schema.NamedPattern` ⟶ `[host.NamedField ...]`
|
||
> **gather** `<named` *n* *p*`>` = `[]`, if (**field** *p*) = `unit`;
|
||
> `[[`*n* (**field** *p*)`]]`, otherwise
|
||
> **gather** `<rec` *f`label`* *f`fields`*`>` = **gather** *f`label`* ⧺ **gather** *f`fields`*
|
||
> **gather** `<tuple [`*f`1` .. f`n`*`]>` = **gather** *f`1`* ⧺ ⋯ ⧺ **gather** *f`n`*
|
||
> **gather** `<tuplePrefix [`*f`1` ... f`n`*`]` *f`repeated`*`>` = **gather** *f`1`* ⧺ ⋯ ⧺ **gather** *f`n`* ⧺ **gather** *f`repeated`*
|
||
> **gather** `<dict {`*v`1`*`:`*f`1` ... v`n`*`:`*f`n`*`}>` = **gather** *f`1`′* ⧺ ⋯ ⧺ **gather** *f`n`′*,
|
||
> where (*f`1`′ ⋯ f`n`′*) are (*f`1` ⋯ f`n`*) sorted according to [Preserves term order](./preserves.html#total-order).
|
||
|
||
## Appendix: Metaschema
|
||
|
||
The metaschema defines the structure of the abstract syntax (AST) of
|
||
schemas, using the concrete DSL syntax described above.
|
||
|
||
The text below is taken from
|
||
[`schema/schema.prs`][schema.prs]
|
||
in the source code repository.
|
||
|
||
A `Bundle` collects a number of `Schema`s, each named by a
|
||
`ModulePath`:[^todo-semantics-of-bundles]
|
||
|
||
Bundle = <bundle @modules Modules>.
|
||
Modules = { ModulePath: Schema ...:... }.
|
||
ModulePath = [symbol ...].
|
||
|
||
Schema = <schema {
|
||
version: Version
|
||
embeddedType: EmbeddedTypeName
|
||
definitions: Definitions
|
||
}>.
|
||
|
||
A `Version` names the version of the schema language in use. At
|
||
present, it must be `1`.
|
||
|
||
; version 1 .
|
||
Version = 1 .
|
||
|
||
An `EmbeddedTypeName` specifies the type of embedded values within
|
||
values parsed by a given schema:
|
||
|
||
EmbeddedTypeName = #f / Ref .
|
||
Ref = <ref @module ModulePath @name symbol>.
|
||
|
||
The `Definitions` are a named collection of definitions within a
|
||
schema. Note the special mention of `pattern0` and `pattern1`: these
|
||
ensure that each `or` or `and` record has at least two members.
|
||
|
||
Definitions = { symbol: Definition ...:... }.
|
||
|
||
Definition =
|
||
; Pattern / Pattern / ...
|
||
/ <or [@pattern0 NamedAlternative
|
||
@pattern1 NamedAlternative
|
||
@patternN NamedAlternative ...]>
|
||
|
||
; Pattern & Pattern & ...
|
||
/ <and [@pattern0 NamedPattern
|
||
@pattern1 NamedPattern
|
||
@patternN NamedPattern ...]>
|
||
|
||
; Pattern
|
||
/ Pattern
|
||
.
|
||
|
||
NamedAlternative = [@variantLabel string @pattern Pattern].
|
||
|
||
Each `Pattern` is either a simple or compound pattern:
|
||
|
||
Pattern = SimplePattern / CompoundPattern .
|
||
|
||
Simple patterns are as described above:
|
||
|
||
SimplePattern =
|
||
; any
|
||
/ =any
|
||
|
||
; special builtins: bool, float, double, int, string, bytes, symbol
|
||
/ <atom @atomKind AtomKind>
|
||
|
||
; matches an embedded value in the input: #!p
|
||
/ <embedded @interface SimplePattern>
|
||
|
||
; =symbol, <<lit> any>, or plain non-symbol atom
|
||
/ <lit @value any>
|
||
|
||
; [p ...] ----> <seqof <ref p>>; see also tuplePrefix below.
|
||
/ <seqof @pattern SimplePattern>
|
||
|
||
; #{p} ----> <setof <ref p>>
|
||
/ <setof @pattern SimplePattern>
|
||
|
||
; {k: v, ...:...} ----> <dictof <ref k> <ref v>>
|
||
/ <dictof @key SimplePattern @value SimplePattern>
|
||
|
||
; symbol, symbol.symbol, symbol.symbol.symbol, ...
|
||
/ Ref
|
||
.
|
||
|
||
AtomKind = =Boolean
|
||
/ =Float
|
||
/ =Double
|
||
/ =SignedInteger
|
||
/ =String
|
||
/ =ByteString
|
||
/ =Symbol .
|
||
|
||
Compound patterns involve optionally-named subpatterns:
|
||
|
||
CompoundPattern =
|
||
; <label a b c> ----> <rec <lit label> <tuple [<ref a> <ref b> <ref c>]>>
|
||
; except for record labels
|
||
; <<rec> x y> ---> <rec <ref x> <ref y>>
|
||
/ <rec @label NamedPattern @fields NamedPattern>
|
||
|
||
; [a b c] ----> <tuple [<ref a> <ref b> <ref c>]>
|
||
/ <tuple @patterns [NamedPattern ...]>
|
||
|
||
; [a b c ...] ----> <tuplePrefix [<ref a> <ref b>] <seqof <ref c>>>
|
||
/ <tuplePrefix @fixed [NamedPattern ...] @variable NamedSimplePattern>
|
||
|
||
; {a: b, c: d} ----> <dict {a: <ref b>, c: <ref d>}>
|
||
/ <dict @entries DictionaryEntries>
|
||
.
|
||
|
||
DictionaryEntries = { any: NamedSimplePattern ...:... }.
|
||
|
||
Explicitly-named subpatterns are always `SimplePattern`s; but,
|
||
depending on context, if a name is omitted, the pattern may be a
|
||
`Pattern` or may be restricted to `SimplePattern` as well:
|
||
|
||
NamedSimplePattern = @named Binding / @anonymous SimplePattern .
|
||
NamedPattern = @named Binding / @anonymous Pattern .
|
||
Binding = <named @name symbol @pattern SimplePattern>.
|
||
|
||
[^todo-semantics-of-bundles]: The semantics of module path references
|
||
remain to be specified!
|
||
|
||
## Appendix: Metaschema instance
|
||
|
||
The following is a (lightly-reformatted) Preserves document which is
|
||
the output of DSL-to-AST compilation of the DSL source text of the
|
||
metaschema.
|
||
|
||
<schema {
|
||
version: 1,
|
||
embeddedType: #f,
|
||
definitions: {
|
||
|
||
Pattern: <or [
|
||
["SimplePattern", <ref [] SimplePattern>],
|
||
["CompoundPattern", <ref [] CompoundPattern>]
|
||
]>,
|
||
|
||
CompoundPattern: <or [
|
||
["rec", <rec <lit rec> <tuple [
|
||
<named label <ref [] NamedPattern>>,
|
||
<named fields <ref [] NamedPattern>>
|
||
]>>],
|
||
["tuple", <rec <lit tuple> <tuple [<named patterns <seqof <ref [] NamedPattern>>>]>>],
|
||
["tuplePrefix", <rec <lit tuplePrefix> <tuple [
|
||
<named fixed <seqof <ref [] NamedPattern>>>,
|
||
<named variable <ref [] NamedSimplePattern>>
|
||
]>>],
|
||
["dict", <rec <lit dict> <tuple [<named entries <ref [] DictionaryEntries>>]>>]
|
||
]>,
|
||
|
||
Modules: <dictof <ref [] ModulePath> <ref [] Schema>>,
|
||
|
||
Ref: <rec <lit ref> <tuple [
|
||
<named module <ref [] ModulePath>>,
|
||
<named name <atom Symbol>>
|
||
]>>,
|
||
|
||
Bundle: <rec <lit bundle> <tuple [<named modules <ref [] Modules>>]>>,
|
||
|
||
Binding: <rec <lit named> <tuple [
|
||
<named name <atom Symbol>>,
|
||
<named pattern <ref [] SimplePattern>>
|
||
]>>,
|
||
|
||
Definition: <or [
|
||
["or", <rec <lit or> <tuple [<tuplePrefix [
|
||
<named pattern0 <ref [] NamedAlternative>>,
|
||
<named pattern1 <ref [] NamedAlternative>>
|
||
] <named patternN <seqof <ref [] NamedAlternative>>>>]>>],
|
||
["and", <rec <lit and> <tuple [<tuplePrefix [
|
||
<named pattern0 <ref [] NamedPattern>>,
|
||
<named pattern1 <ref [] NamedPattern>>
|
||
] <named patternN <seqof <ref [] NamedPattern>>>>]>>],
|
||
["Pattern", <ref [] Pattern>]
|
||
]>,
|
||
|
||
NamedSimplePattern: <or [
|
||
["named", <ref [] Binding>],
|
||
["anonymous", <ref [] SimplePattern>]
|
||
]>,
|
||
|
||
EmbeddedTypeName: <or [
|
||
["false", <lit #f>],
|
||
["Ref", <ref [] Ref>]
|
||
]>,
|
||
|
||
ModulePath: <seqof <atom Symbol>>,
|
||
|
||
AtomKind: <or [
|
||
["Boolean", <lit Boolean>],
|
||
["Float", <lit Float>],
|
||
["Double", <lit Double>],
|
||
["SignedInteger", <lit SignedInteger>],
|
||
["String", <lit String>],
|
||
["ByteString", <lit ByteString>],
|
||
["Symbol", <lit Symbol>]
|
||
]>,
|
||
|
||
DictionaryEntries: <dictof any <ref [] NamedSimplePattern>>,
|
||
|
||
Version: <lit 1>,
|
||
|
||
NamedPattern: <or [
|
||
["named", <ref [] Binding>],
|
||
["anonymous", <ref [] Pattern>]
|
||
]>,
|
||
|
||
SimplePattern: <or [
|
||
["any", <lit any>],
|
||
["atom", <rec <lit atom> <tuple [<named atomKind <ref [] AtomKind>>]>>],
|
||
["embedded", <rec <lit embedded> <tuple [<named interface <ref [] SimplePattern>>]>>],
|
||
["lit", <rec <lit lit> <tuple [<named value any>]>>],
|
||
["seqof", <rec <lit seqof> <tuple [<named pattern <ref [] SimplePattern>>]>>],
|
||
["setof", <rec <lit setof> <tuple [<named pattern <ref [] SimplePattern>>]>>],
|
||
["dictof", <rec <lit dictof> <tuple [
|
||
<named key <ref [] SimplePattern>>,
|
||
<named value <ref [] SimplePattern>>
|
||
]>>],
|
||
["Ref", <ref [] Ref>]
|
||
]>,
|
||
|
||
NamedAlternative: <tuple [
|
||
<named variantLabel <atom String>>,
|
||
<named pattern <ref [] Pattern>>
|
||
]>,
|
||
|
||
Definitions: <dictof <atom Symbol> <ref [] Definition>>,
|
||
|
||
Schema: <rec <lit schema> <tuple [<dict {
|
||
version: <named version <ref [] Version>>,
|
||
embeddedType: <named embeddedType <ref [] EmbeddedTypeName>>,
|
||
definitions: <named definitions <ref [] Definitions>>
|
||
}>]>>
|
||
}
|
||
}>
|
||
|
||
## Appendix: Example generated types
|
||
|
||
The following are the (abridged) TypeScript and Racket generated type
|
||
definitions for the metaschema.
|
||
|
||
### TypeScript.
|
||
|
||
import * as _ from "@preserves/core";
|
||
|
||
// ...
|
||
export type _embedded = any;
|
||
export type _val = _.Value<_embedded>;
|
||
// ...
|
||
|
||
export type Bundle = {"modules": Modules};
|
||
|
||
export type Modules = _.KeyedDictionary<ModulePath, Schema, _embedded>;
|
||
|
||
export type Schema = {
|
||
"version": Version,
|
||
"embeddedType": EmbeddedTypeName,
|
||
"definitions": Definitions
|
||
};
|
||
|
||
export type Version = null;
|
||
|
||
export type EmbeddedTypeName = ({"_variant": "false"} | {"_variant": "Ref", "value": Ref});
|
||
|
||
export type Definitions = _.KeyedDictionary<symbol, Definition, _embedded>;
|
||
|
||
export type Definition = (
|
||
{
|
||
"_variant": "or",
|
||
"pattern0": NamedAlternative,
|
||
"pattern1": NamedAlternative,
|
||
"patternN": Array<NamedAlternative>
|
||
} |
|
||
{
|
||
"_variant": "and",
|
||
"pattern0": NamedPattern,
|
||
"pattern1": NamedPattern,
|
||
"patternN": Array<NamedPattern>
|
||
} |
|
||
{"_variant": "Pattern", "value": Pattern}
|
||
);
|
||
|
||
export type Pattern = (
|
||
{"_variant": "SimplePattern", "value": SimplePattern} |
|
||
{"_variant": "CompoundPattern", "value": CompoundPattern}
|
||
);
|
||
|
||
export type SimplePattern = (
|
||
{"_variant": "any"} |
|
||
{"_variant": "atom", "atomKind": AtomKind} |
|
||
{"_variant": "embedded", "interface": SimplePattern} |
|
||
{"_variant": "lit", "value": _val} |
|
||
{"_variant": "seqof", "pattern": SimplePattern} |
|
||
{"_variant": "setof", "pattern": SimplePattern} |
|
||
{"_variant": "dictof", "key": SimplePattern, "value": SimplePattern} |
|
||
{"_variant": "Ref", "value": Ref}
|
||
);
|
||
|
||
export type CompoundPattern = (
|
||
{"_variant": "rec", "label": NamedPattern, "fields": NamedPattern} |
|
||
{"_variant": "tuple", "patterns": Array<NamedPattern>} |
|
||
{
|
||
"_variant": "tuplePrefix",
|
||
"fixed": Array<NamedPattern>,
|
||
"variable": NamedSimplePattern
|
||
} |
|
||
{"_variant": "dict", "entries": DictionaryEntries}
|
||
);
|
||
|
||
export type DictionaryEntries = _.KeyedDictionary<_val, NamedSimplePattern, _embedded>;
|
||
|
||
export type AtomKind = (
|
||
{"_variant": "Boolean"} |
|
||
{"_variant": "Float"} |
|
||
{"_variant": "Double"} |
|
||
{"_variant": "SignedInteger"} |
|
||
{"_variant": "String"} |
|
||
{"_variant": "ByteString"} |
|
||
{"_variant": "Symbol"}
|
||
);
|
||
|
||
export type NamedAlternative = {"variantLabel": string, "pattern": Pattern};
|
||
|
||
export type NamedSimplePattern = (
|
||
{"_variant": "named", "value": Binding} |
|
||
{"_variant": "anonymous", "value": SimplePattern}
|
||
);
|
||
|
||
export type NamedPattern = (
|
||
{"_variant": "named", "value": Binding} |
|
||
{"_variant": "anonymous", "value": Pattern}
|
||
);
|
||
|
||
export type Binding = {"name": symbol, "pattern": SimplePattern};
|
||
|
||
export type Ref = {"module": ModulePath, "name": symbol};
|
||
|
||
export type ModulePath = Array<symbol>;
|
||
|
||
### Racket.
|
||
|
||
(struct AtomKind-Symbol () #:prefab)
|
||
(struct AtomKind-ByteString () #:prefab)
|
||
(struct AtomKind-String () #:prefab)
|
||
(struct AtomKind-SignedInteger () #:prefab)
|
||
(struct AtomKind-Double () #:prefab)
|
||
(struct AtomKind-Float () #:prefab)
|
||
(struct AtomKind-Boolean () #:prefab)
|
||
|
||
(struct Bundle (modules) #:prefab)
|
||
|
||
(struct CompoundPattern-dict (entries) #:prefab)
|
||
(struct CompoundPattern-tuplePrefix (fixed variable) #:prefab)
|
||
(struct CompoundPattern-tuple (patterns) #:prefab)
|
||
(struct CompoundPattern-rec (label fields) #:prefab)
|
||
|
||
(struct Definition-Pattern (value) #:prefab)
|
||
(struct Definition-and (pattern0 pattern1 patternN) #:prefab)
|
||
(struct Definition-or (pattern0 pattern1 patternN) #:prefab)
|
||
|
||
(struct EmbeddedTypeName-false () #:prefab)
|
||
(struct EmbeddedTypeName-Ref (value) #:prefab)
|
||
|
||
(struct NamedAlternative (variantLabel pattern) #:prefab)
|
||
|
||
(struct NamedPattern-anonymous (value) #:prefab)
|
||
(struct NamedPattern-named (value) #:prefab)
|
||
|
||
(struct NamedSimplePattern-anonymous (value) #:prefab)
|
||
(struct NamedSimplePattern-named (value) #:prefab)
|
||
|
||
(struct Binding (name pattern) #:prefab)
|
||
|
||
(struct Pattern-CompoundPattern (value) #:prefab)
|
||
(struct Pattern-SimplePattern (value) #:prefab)
|
||
|
||
(struct Ref (module name) #:prefab)
|
||
|
||
(struct Schema (definitions embeddedType version) #:prefab)
|
||
|
||
(struct SimplePattern-Ref (value) #:prefab)
|
||
(struct SimplePattern-dictof (key value) #:prefab)
|
||
(struct SimplePattern-setof (pattern) #:prefab)
|
||
(struct SimplePattern-seqof (pattern) #:prefab)
|
||
(struct SimplePattern-lit (value) #:prefab)
|
||
(struct SimplePattern-embedded (interface) #:prefab)
|
||
(struct SimplePattern-atom (atomKind) #:prefab)
|
||
(struct SimplePattern-any () #:prefab)
|
||
|
||
## Appendix: Future work
|
||
|
||
- There are side conditions on AST instances. It would be nice to
|
||
eventually be able to express these within the metaschema.
|
||
|
||
- It'd be interesting to,
|
||
[Ometa](https://en.wikipedia.org/wiki/OMeta)-like, be able to
|
||
specify the DSL-to-AST translation process as a schema. One
|
||
challenge in doing so is the way schemas are required to be
|
||
*reversible* at present.
|
||
|
||
- Should `include` accept URLs, to be able to retrieve schema from
|
||
the web?
|
||
|
||
- It'd be nice to firm up the interpretation of embedded interface
|
||
schemas. I have in mind something like the
|
||
[higher-order contracts of Dimoulas](https://www2.ccs.neu.edu/racket/pubs/dissertation-dimoulas.pdf).
|
||
Essentially, a schema *is* a contract, and embedded
|
||
pointers-to-behaviour are like closures/channels/objects/etc, which
|
||
demand higher-order contracts. Future work could pin this down
|
||
further; also, consideration of *dependent* schemas (analogous to
|
||
dependent contracts) could be of interest.
|
||
|
||
**Example.** In the following fragment, `#!Session` is the handle a
|
||
connected user uses to interact with a chatroom. In the
|
||
implementation, `Says` messages are dropped if their `who` doesn't
|
||
match the `uid` supplied in the `Join` assertion. It'd be nice to
|
||
capture that using a dependent schema, passing in the specific
|
||
`uid` value to the `Session` constructor, something like
|
||
`#!(Session uid)`.
|
||
|
||
Join = <joinedUser @uid UserId @handle #!Session>.
|
||
Session = @observeSpeech <Observe =says @observer #!Says> / Says .
|
||
Says = <says @who UserId @what string>.
|
||
|
||
|
||
<!-- Heading to visually offset the footnotes from the main document: -->
|
||
## Notes
|