Begin sketch of Preserves Schema semantics

This commit is contained in:
Tony Garnock-Jones 2022-06-09 21:30:56 +02:00
parent 5b01beb2f6
commit b332f2668e
3 changed files with 142 additions and 3 deletions

View File

@ -4,7 +4,7 @@ title: "Preserves Schema"
--- ---
Tony Garnock-Jones <tonyg@leastfixedpoint.com> Tony Garnock-Jones <tonyg@leastfixedpoint.com>
June 2022. Version 0.2.0. June 2022. Version 0.3.0.
[abnf]: https://tools.ietf.org/html/rfc7405 [abnf]: https://tools.ietf.org/html/rfc7405
@ -423,13 +423,119 @@ the given name in the overall record type for a definition. The type
of value contained in the field will correspond to the `Pattern` or of value contained in the field will correspond to the `Pattern` or
`SimplePattern` given. `SimplePattern` given.
## Semantics
Having covered concrete syntax, we now give semantics for the schema
language in terms of the [abstract syntax][schema.prs] and of the
language of Preserves `Value`s.
[schema.prs]: https://gitlab.com/preserves/preserves/-/blob/main/schema/schema.prs
### Metaschema interpreter
(TODO: this subsection is to define an interpreter for metaschema values
applied to Preserves `Value`s.)
### Host-language types
The host-language types corresponding to a metaschema instance can
themselves be described according to a grammar.
The definitions in this section should be understood as being part of a
module named `host`, in a bundle alongside a module named `schema`
corresponding to the metaschema in the appendix below.
#### Abstract host language types
Definition = <union @variants [Variant ...]> / Simple .
Variant = [@label symbol @type Simple] .
The host-language type corresponding to a definition will either be a
tagged union (side condition: at least two `Variant`s are present in a
`union`) or a *simple* type.
Simple = Field / Record .
Record = <rec @fields [NamedField ...]> .
NamedField = [@name symbol @type Field] .
A *simple* type may be either a single, simple value of *field* type, or
a record of multiple named fields, each having a specific *field* type.
Field = =unit
/ =any
/ =embedded
/ <array @element Field>
/ <set @element Field>
/ <map @key Field @value Field>
/ <ref @name schema.Ref>
/ schema.AtomKind .
A *field* type is either
- the language's unit type (the empty tuple, the "void" value),
- the universal type of all Preserves `Value`s,
- the type of some host-language [embedded value](./preserves.html#embeddeds) in some context,
- the type of a uniform array having elements of a specific *field* type,
- the type of a set having elements of a specific *field* type,
- the type of a dictionary connecting keys of specific type to values of specific type,
- the type associated with some other named definition in scope in the current Schema bundle, or
- the type of a specific kind of Preserves [`Atom`](./preserves#values).
#### Computing abstract types from a metaschema instance
Given a metaschema definition *d* : `schema.Definition`, the function
**typeof**{:.pseudocode} yields a `host.Definition`.
{:.pseudocode #def:typeof}
> **typeof** : `schema.Definition``host.Definition`
> **typeof** `<or [[`*n`1`* *p`1`*`]` ... `[`*n`n`* *p`n`*`]]>` = `<union [[`*n`1`* (**pat** *p`1`*)`]` ... `[`*n`n`* (**pat** *p`n`*)`]]>`
> **typeof** `<and [`*f`1` ... f`n`*`]>` = **product** `[`*f`1` ... f`n`*`]`
> **typeof** *p* = **pat** *p*, when *p*`schema.Pattern`
{:.pseudocode #def:pat}
> **pat** : `schema.Pattern``host.Simple`
> **pat** *s* = **field** *s*, when *s*`schema.SimplePattern`
> **pat** *c* = **product** `[`*c*`]`, when *c*`schema.CompoundPattern`
{:.pseudocode #def:field}
> **field** : `schema.SimplePattern``host.Field`
> **field** `any` = `any`
> **field** `<atom` *k*`>` = *k*
> **field** `<embedded` *s*`>` = `embedded`
> **field** `<lit` *v*`>` = `unit`
> **field** `<seqof` *s*`>` = `<array` (**field** s)`>`
> **field** `<setof` *s*`>` = `<set` (**field** s)`>`
> **field** `<dictof` *s`k`* *s`v`*`>` = `<map` (**field** *s`k`*) (**field** *s`v`*)`>`
> **field** *r* = *r*, when *r*`schema.Ref`
The helper function **product**{:.pseudocode} is where `unit`-valued
fields are omitted from the computed host-language type. If all fields
are so omitted, or if there were (recursively) no bindings in the input
patterns, **product**{:.pseudocode} yields `unit` type itself.
{:.pseudocode #def:product}
> **product** : `[schema.NamedPattern` ...`]` ⟶ `host.Simple`
> **product** `[`*f`1` ... f`n`*`]` = `unit`, if *t* = `[]`;
> `<rec` *t*`>`, otherwise
> where *t* = **gather** *f`1`* ⧺ ⋯ ⧺ **gather** *f`n`*
{:.pseudocode #def:gather}
> **gather** : `schema.NamedPattern``[host.NamedField ...]`
> **gather** `<named` *n* *p*`>` = `[]`, if (**field** *p*) = `unit`;
> `[[`*n* (**field** *p*)`]]`, otherwise
> **gather** `<rec` *f`label`* *f`fields`*`>` = **gather** *f`label`***gather** *f`fields`*
> **gather** `<tuple [`*f`1` .. f`n`*`]>` = **gather** *f`1`* ⧺ ⋯ ⧺ **gather** *f`n`*
> **gather** `<tuplePrefix [`*f`1` ... f`n`*`]` *f`repeated`*`>` = **gather** *f`1`* ⧺ ⋯ ⧺ **gather** *f`n`***gather** *f`repeated`*
> **gather** `<dict {`*v`1`*`:`*f`1` ... v`n`*`:`*f`n`*`}>` = **gather** *f`1`* ⧺ ⋯ ⧺ **gather** *f`n`*,
> where (*f`1` ⋯ f`n`*) are (*f`1` ⋯ f`n`*) sorted according to [Preserves term order](./preserves.html#total-order).
## Appendix: Metaschema ## Appendix: Metaschema
The metaschema defines the structure of the abstract syntax (AST) of The metaschema defines the structure of the abstract syntax (AST) of
schemas, using the concrete DSL syntax described above. schemas, using the concrete DSL syntax described above.
The text below is taken from The text below is taken from
[`schema/schema.prs`](https://gitlab.com/preserves/preserves/-/blob/main/schema/schema.prs) [`schema/schema.prs`][schema.prs]
in the source code repository. in the source code repository.
A `Bundle` collects a number of `Schema`s, each named by a A `Bundle` collects a number of `Schema`s, each named by a

View File

@ -1,5 +1,9 @@
:root {
--sans-font: -apple-system, BlinkMacSystemFont, avenir next, avenir, segoe ui, helvetica neue, helvetica, Cantarell, Ubuntu, roboto, noto, arial, sans-serif;
--serif-font: palatino, "Palatino Linotype", "Palatino LT STD", "URW Palladio L", "TeX Gyre Pagella", serif;
}
body { body {
font-family: palatino, "Palatino Linotype", "Palatino LT STD", "URW Palladio L", "TeX Gyre Pagella", serif; font-family: var(--serif-font);
box-sizing: border-box; box-sizing: border-box;
line-height: 1.414; line-height: 1.414;
} }
@ -103,6 +107,34 @@ blockquote {
background-color: #e9f0f9; background-color: #e9f0f9;
} }
blockquote.pseudocode {
border-left: none;
padding: 0;
}
.pseudocode strong, strong.pseudocode {
font-family: var(--sans-font);
font-weight: inherit;
font-size: 90%;
}
.pseudocode :first-child {
margin-top: 0;
}
.pseudocode :last-child {
margin-bottom: 0;
}
.pseudocode pre, .pseudocode code { background-color: inherit; }
.pseudocode p {
white-space: pre;
}
.pseudocode em > code, .pseudocode strong > code {
vertical-align: sub;
font-size: 75%;
font-family: inherit;
}
.pseudocode code {
font-size: 80%;
}
/*---------------------------------------------------------------------------*/ /*---------------------------------------------------------------------------*/
/* Rouge syntax classes */ /* Rouge syntax classes */

View File

@ -35,6 +35,7 @@ Taking inspiration from functional programming, we start with a
definition of the *values* that we want to work with and give them definition of the *values* that we want to work with and give them
meaning independent of their syntax. meaning independent of their syntax.
<a id="values"></a>
Our `Value`s fall into two broad categories: *atomic* and *compound* Our `Value`s fall into two broad categories: *atomic* and *compound*
data. Every `Value` is finite and non-cyclic. Embedded values, called data. Every `Value` is finite and non-cyclic. Embedded values, called
`Embedded`s, are a third, special-case category. `Embedded`s, are a third, special-case category.