From de7ac63b9679d4ae866fea3534f09035793af281 Mon Sep 17 00:00:00 2001 From: Tony Garnock-Jones Date: Tue, 25 May 2021 14:11:33 +0200 Subject: [PATCH] First stab at specification of Schema --- .gitignore | 1 + Makefile | 11 +- preserves-schema.md | 693 ++++++++++++++++++++++++++++++++++++++++++++ schema/schema.prs | 1 + 4 files changed, 704 insertions(+), 2 deletions(-) create mode 100644 preserves-schema.md diff --git a/.gitignore b/.gitignore index 7e69498..e76fd1d 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ _site/ preserves.pdf +preserves-schema.pdf scratch/ diff --git a/Makefile b/Makefile index 5ae8065..03ef341 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,13 @@ -preserves.pdf: preserves.md preserves.css +PDFS=preserves.pdf preserves-schema.pdf + +all: $(PDFS) + +clean: + rm -f $(PDFS) + +%.pdf: %.md preserves.css google-chrome --headless --disable-gpu --print-to-pdf=$@ \ - http://localhost:4000/preserves/preserves.html + http://localhost:4000/preserves/$*.html test-all: make -C tests diff --git a/preserves-schema.md b/preserves-schema.md new file mode 100644 index 0000000..8d96d87 --- /dev/null +++ b/preserves-schema.md @@ -0,0 +1,693 @@ +--- +no_site_title: true +title: "Preserves Schema" +--- + +Tony Garnock-Jones +May 2021. Version 0.1.0. + + [abnf]: https://tools.ietf.org/html/rfc7405 + +This document proposes a Schema language for the +[Preserves data model](./preserves.html). + +## Introduction + +A Preserves schema connects Preserves `Value`s to host-language data +structures. Each definition within a schema can be processed by a +compiler to produce + + - a host-language *type definition*; + + - a partial *parsing* functions from `Value`s to instances of the + produced type; and + + - a total *serialization* function from instances of the type to + `Value`s. + +Every parsed `Value` retains enough information to always be able to +be serialized again, and every instance of a host-language data +structure contains, by construction, enough information to be +successfully serialized. + +**Example.** Sending the schema + + version 1 . + Date = . + Person = . + +to the TypeScript schema compiler produces types, + + type Date = {"year": number, "month": number, "day": number}; + type Person = {"name": string, "birthday": Date}; + +constructors, + + function Date({year, month, day}: {year: number, month: number, day: number}): Date; + function Person({name, birthday}: {name: string, birthday: Date}): Person; + +partial parsing functions which throw on parse failure, + + function asDate(v: _val): Date; + function asPerson(v: _val): Person; + +total parsing functions which yield `undefined` on parse failure, + + function toDate(v: _val): undefined | Date; + function toPerson(v: _val): undefined | Person; + +and total serialization functions, + + function fromDate(_v: Date): _val; + function fromPerson(_v: Person): _val; + +## Concepts + +**Bundle.** A collection of schemas, each named by a module path. + +**Definition.** A named pattern within a schema. When compiled, a +definition will usually produce a type (plus associated constructors +and predicates), a parser function, and a serializer function. + +**Metaschema.** The Preserves metaschema is a schema describing the +abstract syntax of all schema instances (including itself). + +**Module path.** A sequence of symbols, denoting a leaf in a tree with +symbol-labelled edges. + +**Pattern.** A pattern describes a collection of `Value`s as well as +providing names for the portions of matching `Value`s that should be +captured in a host-language data type. + +**Schema abstract syntax tree (AST).** Schema-manipulating tools will +usually work with schema AST; that is, with `Value`s conforming to the +metaschema or instances of the corresponding host-language +datastructures. + +**Schema domain-specific language (DSL).** While human beings *can* +work directly with Preserves documents matching the metaschema, the +schema DSL provides an easier-to-read and -write language for working +with schemas that can be translated into instances + +**Schema.** A collection of definitions, plus an optional schema-wide +reference to a schema describing embedded values. + +## Concrete (DSL) Syntax + +In this section, we use an [ABNF][abnf]-like notation to define a +textual syntax that is easy for people to read and write. Most of the +examples in this document are written using this syntax. In the +following section, we will define the abstract syntax that this +surface syntax translates into. + +### Schema files and bundles. + +Each schema should be placed in a single file. Schema files usually +end with extension `.prs`, and consist of a sequence of Preserves +`Value`s[^like-sexps] separated into *clauses* by the Preserves +`Symbol` "`.`". + +[^like-sexps]: That is, schema files use Preserves as a kind of + S-expression! + +A bundle of schema files is a directory tree containing `.prs` files. + +### Clauses. + + Clause = (Version / EmbeddedTypeName / Definition) "." + + Version = "version" "1" + EmbeddedTypeName = "embeddedType" ("#f" / Ref) + Definition = symbol "=" (OrPattern / AndPattern / Pattern) + +**Version specification.** Mandatory. Names the version of the schema +language used in the file. This version of the specification is +referred to in schema files as `version 1`. + +**Embedded type name.** Optional. If given as `#f` (the default), it +declares that values parsed by the schema do not contain embedded +`Value`s of any particular type. If given as a `Ref`, a reference to a +definition in this or a neighbouring schema, it declares that embedded +`Value`s must themselves conform to the named definition. + +**Definition.** Each definition clause implicitly connects a pattern +with a type name and a set of associated functions. + +### Union definitions. + + OrPattern = AltPattern "/" AltPattern *("/" AltPattern) + +The right-hand-side of a definition may supply two or more +*alternatives*. When parsing, the alternatives are tried in order; the +result of the first successful alternative is the result of the entire +parse. + +The type corresponding to an `OrPattern` is a union type, a variant +type, or an algebraic sum type, depending on the host language. + +Each alternative with an `OrPattern` must have a definition-unique +*name*. The name can either be given explicitly as `@name` (see +discussion of `NamedPattern` below) or inferred. It can only be +inferred from the label of a record pattern, from the name of a +reference to another definition, or from the text of a "sufficiently +stringlike" literal pattern - one that matches a string, symbol, +number or boolean: + + AltPattern = "@" symbol SimplePattern + / "<" symbol *(NamedPattern) ">" + / Ref + / LiteralPattern -- with a side condition + +### Intersection definitions. + + AndPattern = NamedPattern "&" NamedPattern *("&" NamedPattern) + +The right-hand-side of a definition may supply two or more patterns, +the *intersection* of whose denotations is the denotation of the +overall definition. When parsing, every pattern is tried; if all +succeed, the resulting information is combined into a single record +type. + +When serializing, the terms resulting from serializing at each pattern +are *merged* together. + +#### Experimental. + +Intersections are an experimental feature. They can be used to express +*optional dictionary entries*: + + MyDict = {a: int, b: string} & @c MaybeC . + MaybeC = @present {c: symbol} / @absent {} . + +It is not yet clear whether they pull their weight. In particular, the +semantics of serializing a value defined by intersection are not +completely clear. + +### Patterns. + + Pattern = SimplePattern / CompoundPattern + +Patterns come in two kinds: + + - the parsers for *simple patterns* yield a single host-language + value such as a string, number, or pointer. + + - the parsers for *compound patterns* yield zero or more *fields* + which combine into an overall record type associated with a + definition. + +#### Simple patterns + + SimplePattern = AnyPattern + / AtomKindPattern + / EmbeddedPattern + / LiteralPattern + / SequenceOfPattern + / SetOfPattern + / DictOfPattern + / Ref + +The `any` pattern matches any input `Value`: + + AnyPattern = "any" + +Specifying the name of a kind of `Atom` matches that kind of atom: + + AtomKindPattern = "bool" / "float" / "double" / "int" / "string" / "bytes" / "symbol" + +Specifying `embedded` matches an `Embedded` value, following the +schema-wide `embeddedType`, if any: + + EmbeddedPattern = "embedded" + +A literal pattern may be expressed in any of three ways: non-symbol +atoms stand for themselves directly; symbols, prefixed with an equal +sign, are matched literally; and any `Value` at all may be quoted by +placing it in a `< ... >` record: + + LiteralPattern = "="symbol / "<" value ">" / non-symbol-atom + +Brackets containing an item pattern and a literal ellipsis match a +sequence of items, each matching the nested item pattern. Sets and +uniform dictionaries are similar. + + SequenceOfPattern = "[" SimplePattern "..." "]" + SetOfPattern = "#{" SimplePattern "}" + DictOfPattern = "{" SimplePattern ":" SimplePattern "...:..." "}" + +Finally, a reference to some other definition, in this schema or a +neighbouring schema within this bundle, is made by mentioning the +possibly-qualified name of the definition as a bare symbol: + + Ref = symbol + +Periods "`.`" in such symbols are special: + + - `Name` refers to the definition named `Name` in the current schema. + - `Mod.Submod.Name` refers to definition `Name` in `Mod.Submod`, some other schema in the bundle. + +#### Compound patterns + + CompoundPattern = RecordPattern + / TuplePattern + / VariableTuplePattern + / DictionaryPattern + +A record pattern matches an input record. It may be specified as a +record with a literal in the label position, or as a quoted `< +... >` record: + + RecordPattern = "<" value *(NamedPattern) ">" + / "<" NamedPattern *(NamedPattern) ">" + +A tuple pattern matches a fixed-length sequence with specific patterns +in each position. A variable tuple pattern is the same, but with an +additional pattern for matching additional elements following the +fixed-position patterns. + + TuplePattern = "[" *(NamedPattern) "]" + VariableTuplePattern = "[" 1*(NamedPattern) NamedSimplePattern "..." "]" + +A dictionary pattern matches specific literal keys in an input +dictionary. If no explicit name is given for a particular +`NamedSimplePattern`, but the key for the pattern is a symbol, then +that symbol is used as the name for that dictionary entry. + + DictionaryPattern = "{" *(value ":" NamedSimplePattern) "}" + +### Bindings: NamedPattern and NamedSimplePattern + +Compound patterns specifications contain `NamedPattern`s or +`NamedSimplePattern`s rather than ordinary `Pattern`s: + + NamedPattern = "@" symbol SimplePattern / Pattern + NamedSimplePattern = "@" symbol SimplePattern / SimplePattern + +Use of an `@name` prefix generally results in creation of a field with +the given name in the overall record type for a definition. The type +of value contained in the field will correspond to the `Pattern` or +`SimplePattern` given. + +## Appendix: Metaschema + +The metaschema defines the structure of the abstract syntax (AST) of +schemas, using the concrete DSL syntax described above. + +The text below is taken from +[`schema/schema.prs`](https://gitlab.com/preserves/preserves/-/blob/main/schema/schema.prs) +in the source code repository. + +A `Bundle` collects a number of `Schema`s, each named by a +`ModulePath`:[^todo-semantics-of-bundles] + + Bundle = . + Modules = { ModulePath: Schema ...:... }. + ModulePath = [symbol ...]. + + Schema = . + +A `Version` names the version of the schema language in use. At +present, it must be `1`. + + ; version 1 . + Version = 1 . + +An `EmbeddedTypeName` specifies the type of embedded values within +values parsed by a given schema: + + EmbeddedTypeName = Ref / #f. + Ref = . + +The `Definitions` are a named collection of definitions within a +schema. Note the special mention of `pattern0` and `pattern1`: these +ensure that each `or` or `and` record has at least two members. + + Definitions = { symbol: Definition ...:... }. + + Definition = + ; Pattern / Pattern / ... + / + + ; Pattern & Pattern & ... + / + + ; Pattern + / Pattern + . + + NamedAlternative = [@variantLabel string @pattern Pattern]. + +Each `Pattern` is either a simple or compound pattern: + + Pattern = SimplePattern / CompoundPattern . + +Simple patterns are as described above: + + SimplePattern = + ; any + / =any + + ; special builtins: bool, float, double, int, string, bytes, symbol + / + + ; matches an embedded value in the input: embedded + / + + ; =symbol, < any>, or plain non-symbol atom + / + + ; [p ...] ----> >; see also tuple* below. + / + + ; #{p} ----> > + / + + ; {k: v, ...:...} ----> > + / + + ; symbol, symbol.symbol, symbol.symbol.symbol, ... + / Ref + . + + AtomKind = =Boolean + / =Float + / =Double + / =SignedInteger + / =String + / =ByteString + / =Symbol . + +Compound patterns involve optionally-named subpatterns: + + CompoundPattern = + ;