Attempt to clarify Preserves Schema relationship to host-language types.

This commit is contained in:
Tony Garnock-Jones 2022-06-09 14:59:53 +02:00
parent c49b8673f7
commit 9ba8617952
2 changed files with 124 additions and 39 deletions

View File

@ -4,7 +4,7 @@ title: "Preserves Schema"
---
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
June 2021. Version 0.1.3.
June 2022. Version 0.2.0.
[abnf]: https://tools.ietf.org/html/rfc7405
@ -17,7 +17,7 @@ A Preserves schema connects Preserves `Value`s to host-language data
structures. Each definition within a schema can be processed by a
compiler to produce
- a host-language *type definition*;
- a simple host-language *type definition*;
- a partial *parsing* function from `Value`s to instances of the
produced type; and
@ -30,6 +30,24 @@ be serialized again, and every instance of a host-language data
structure contains, by construction, enough information to be
successfully serialized.
**Portability.** Preserves Schema is broadly portable. Any host-language
type system that can represent [algebraic
types](https://en.wikipedia.org/wiki/Algebraic_data_type) in some way
should be suitable as a compilation target.
This includes ML-family languages like [Rust][rust-impl] and Haskell,
object-oriented languages like Java, [Python][python-impl] and
[Smalltalk][smalltalk-impl], and multiparadigm languages like
[JavaScript][ts-impl], [TypeScript][ts-impl], [Racket][racket-impl],
[Nim][nim-impl] and Erlang.
[nim-impl]: https://git.syndicate-lang.org/ehmry/preserves-nim
[python-impl]: https://gitlab.com/preserves/preserves/-/blob/main/implementations/python/preserves/schema.py
[racket-impl]: https://gitlab.com/preserves/preserves/-/tree/main/implementations/racket/preserves/preserves-schema
[rust-impl]: https://gitlab.com/preserves/preserves/-/tree/main/implementations/rust/preserves-schema
[smalltalk-impl]: https://squeaksource.com/Preserves.html
[ts-impl]: https://gitlab.com/preserves/preserves/-/tree/main/implementations/javascript/packages/schema
**Example.** Sending the schema
version 1 .
@ -111,13 +129,13 @@ conventions:
- `UpperCamelCase` for *definition* names.
- Either `lowerCamelCase` or `UpperCamelCase` for definition-unique
names for alternatives within a union definition.
names for alternatives within an alternation definition.
- `lowerCamelCase` for *module* names (schema names, package names)
and *field* or *variable* names.
## Concrete (DSL) Syntax
## The Preserves Schema Language
In this section, we use an [ABNF][abnf]-like notation to define a
textual syntax that is easy for people to read and write. Most of the
@ -144,7 +162,6 @@ A bundle of schema files is a directory tree containing `.prs` files.
Version = "version" "1"
EmbeddedTypeName = "embeddedType" ("#f" / Ref)
Include = "include" string
Definition = id "=" (OrPattern / AndPattern / Pattern)
**Version specification.** Mandatory. Names the version of the schema
language used in the file. This version of the specification is
@ -160,10 +177,33 @@ definition in this or a neighbouring schema, it declares that embedded
file as if it were textually inserted in place of this clause. The
file path may be relative to the current file, or absolute.
**Definition.** Each definition clause implicitly connects a pattern
with a type name and a set of associated functions.
### Definitions.
### Union definitions.
Definition = id "=" (OrPattern / AndPattern / Pattern)
Each definition clause connects a pattern over `Value`s with a
host-language type name (derived from the supplied `id`) and set of
associated functions.
A definition may be
- an *alternation* of patterns, allowing for biased choice among alternatives;
- an *intersection* of patterns, allowing for composition and reuse of patterns; or
- the base case, an ordinary pattern.
**Host-language types.** Each definition includes *bindings* that
capture information from a parsed `Value` and expose it to programs in
the host language. When more than one binding is present in a
definition, a host-language record (product, structure, tuple) will be
the result of a parse; otherwise, a simple value will result. When a
definition involves *alternation*, a host-language representation of a
sum over the types of each branch of the alternation will result. For
example, a compiler targeting an object-oriented host language would
produce a base class for each definition, with a field for each binding
and a subclass for each variant alternative. A functional host language
with algebraic data types would produce a labelled-sum-of-products type.
### Alternation definitions.
OrPattern = AltPattern "/" AltPattern *("/" AltPattern)
@ -172,16 +212,20 @@ The right-hand-side of a definition may supply two or more
result of the first successful alternative is the result of the entire
parse.
The type corresponding to an `OrPattern` is a union type, a variant
type, or an algebraic sum type, depending on the host language.
**Host-language types.** The type corresponding to an `OrPattern` is an
algebraic sum type, a union type, a variant type, or a concrete subclass
of an abstract superclass, depending on the host language.
Each alternative with an `OrPattern` must have a definition-unique
*name*. The name can either be given explicitly as `@name` (see
discussion of `NamedPattern` below) or inferred. It can only be
inferred from the label of a record pattern, from the name of a
reference to another definition, or from the text of a "sufficiently
identifierlike" literal pattern - one that matches a string, symbol,
number or boolean:
**Variant names.** Each alternative with an `OrPattern` must have a
definition-unique *name*. The name is used to uniquely label the
alternative's host-language representation (for example, a subclass, or
a member of a tagged union type).
A variant name can either be given explicitly as `@name` (see discussion
of `NamedPattern` below) or inferred. It can only be inferred from the
label of a record pattern, from the name of a reference to another
definition, or from the text of a "sufficiently identifierlike" literal
pattern - one that matches a string, symbol, number or boolean:
AltPattern = "@" id SimplePattern
/ "<" id PatternSequence ">"
@ -192,30 +236,48 @@ number or boolean:
AndPattern = NamedPattern "&" NamedPattern *("&" NamedPattern)
The right-hand-side of a definition may supply two or more patterns,
the *intersection* of whose denotations is the denotation of the
overall definition. When parsing, every pattern is tried; if all
succeed, the resulting information is combined into a single record
type.
The right-hand-side of a definition may supply two or more patterns, the
*intersection* of whose denotations is the denotation of the overall
definition. When parsing, every pattern is tried: if all succeed, the
resulting information is combined into a single type; otherwise, the
overall parse fails.
When serializing, the terms resulting from serializing at each pattern
are *merged* together.
#### Experimental.
**Host-language types.** Compiling an intersection definition produces a
host-language type that is effectively the algebraic product of the
types of the parts of the intersection. Practically, this usually means
a record (product, structure, tuple) type.
Intersections are an experimental feature. They can be used to express
*optional dictionary entries*:[^not-ideal-optional-encoding]
MyDict = {a: int, b: string} & @c MaybeC .
MaybeC = @present {c: symbol} / @invalid {c: any} / @absent {} .
It is not yet clear whether they pull their weight. In particular, the
semantics of serializing a value defined by intersection are not
completely clear.
[^not-ideal-optional-encoding]: This encoding is not ideal. It passes
responsibility for checking for invalid inputs up to the user,
rather than handling it completely at the Schema layer.
{:.rationale}
> #### Experimental.
>
> Intersections are an experimental feature. They can be used to express
> *optional dictionary entries*:
>
> MyDict = {a: int, b: string} & @c MaybeC .
> MaybeC = @present {c: symbol} / @invalid {c: any} / @absent {} .
>
> They can also be used to express something reminiscent of *inheritance*:
>
> Type = @base BaseFields & @detail SubType .
> BaseFields = {a: int, b: string} .
> SubType = @base {}
> / @variantA { x: int }
> / @mid Mid .
> Mid = { y: symbol } & @detail SubSubType .
> SubSubType = @variantB { z: "type-b" }
> / @variantC { z: "type-c" }
>
> It is not yet clear whether they pull their weight.
>
> From the point of view of the user of the schema language, using
> intersections to express optional values is cumbersome. Not only is it
> verbose, requiring auxiliary definitions, but it leaves responsibility
> for checking for invalid inputs up to the user, rather than handling
> it completely at the Schema layer. A future Schema version will likely
> include first-class support for optionality.
### Patterns.
@ -223,13 +285,26 @@ completely clear.
Patterns come in two kinds:
- the parsers for *simple patterns* yield a single host-language
value; for example, a string, an array, a pointer, and so on.
- The parsers for *simple patterns* yield a single host-language
value—for example, a string, an array, a number, or a pointer—or
even, in the case of `LiteralPattern`s, no host-language values at
all.[^no-values-at-all]
- the parsers for *compound patterns* yield zero or more *fields*
- The parsers for *compound patterns* yield zero or more *fields*
which combine into an overall record type associated with a
definition.
[^no-values-at-all]: The case of a `LiteralPattern` yielding no
host-language values is interesting. All the information required to
reversibly store the result of a parse is already in the schema, so
nothing need be stored at runtime in host-language data type
instances. Concretely, a definition consisting only of a
`LiteralPattern` might correspond to a host-language unit type (the
empty tuple, the "void" value). Definitions consisting of
`CompoundPattern`s involving `LiteralPattern`s do not even need to
store this much: fields of unit type in a host-language record type
can simply be omitted without loss.
#### Simple patterns
SimplePattern = AnyPattern

View File

@ -93,6 +93,16 @@ td {
padding-right: 0.5rem;
}
blockquote {
padding: 0.5rem 1rem;
border-left: solid #4f81bd 2px;
margin-right: 0;
}
.rationale {
background-color: #e9f0f9;
}
/*---------------------------------------------------------------------------*/
/* Rouge syntax classes */