Attempt to clarify Preserves Schema relationship to host-language types.

This commit is contained in:
Tony Garnock-Jones 2022-06-09 14:59:53 +02:00
parent c49b8673f7
commit 9ba8617952
2 changed files with 124 additions and 39 deletions

View File

@ -4,7 +4,7 @@ title: "Preserves Schema"
--- ---
Tony Garnock-Jones <tonyg@leastfixedpoint.com> Tony Garnock-Jones <tonyg@leastfixedpoint.com>
June 2021. Version 0.1.3. June 2022. Version 0.2.0.
[abnf]: https://tools.ietf.org/html/rfc7405 [abnf]: https://tools.ietf.org/html/rfc7405
@ -17,7 +17,7 @@ A Preserves schema connects Preserves `Value`s to host-language data
structures. Each definition within a schema can be processed by a structures. Each definition within a schema can be processed by a
compiler to produce compiler to produce
- a host-language *type definition*; - a simple host-language *type definition*;
- a partial *parsing* function from `Value`s to instances of the - a partial *parsing* function from `Value`s to instances of the
produced type; and produced type; and
@ -30,6 +30,24 @@ be serialized again, and every instance of a host-language data
structure contains, by construction, enough information to be structure contains, by construction, enough information to be
successfully serialized. successfully serialized.
**Portability.** Preserves Schema is broadly portable. Any host-language
type system that can represent [algebraic
types](https://en.wikipedia.org/wiki/Algebraic_data_type) in some way
should be suitable as a compilation target.
This includes ML-family languages like [Rust][rust-impl] and Haskell,
object-oriented languages like Java, [Python][python-impl] and
[Smalltalk][smalltalk-impl], and multiparadigm languages like
[JavaScript][ts-impl], [TypeScript][ts-impl], [Racket][racket-impl],
[Nim][nim-impl] and Erlang.
[nim-impl]: https://git.syndicate-lang.org/ehmry/preserves-nim
[python-impl]: https://gitlab.com/preserves/preserves/-/blob/main/implementations/python/preserves/schema.py
[racket-impl]: https://gitlab.com/preserves/preserves/-/tree/main/implementations/racket/preserves/preserves-schema
[rust-impl]: https://gitlab.com/preserves/preserves/-/tree/main/implementations/rust/preserves-schema
[smalltalk-impl]: https://squeaksource.com/Preserves.html
[ts-impl]: https://gitlab.com/preserves/preserves/-/tree/main/implementations/javascript/packages/schema
**Example.** Sending the schema **Example.** Sending the schema
version 1 . version 1 .
@ -111,13 +129,13 @@ conventions:
- `UpperCamelCase` for *definition* names. - `UpperCamelCase` for *definition* names.
- Either `lowerCamelCase` or `UpperCamelCase` for definition-unique - Either `lowerCamelCase` or `UpperCamelCase` for definition-unique
names for alternatives within a union definition. names for alternatives within an alternation definition.
- `lowerCamelCase` for *module* names (schema names, package names) - `lowerCamelCase` for *module* names (schema names, package names)
and *field* or *variable* names. and *field* or *variable* names.
## Concrete (DSL) Syntax ## The Preserves Schema Language
In this section, we use an [ABNF][abnf]-like notation to define a In this section, we use an [ABNF][abnf]-like notation to define a
textual syntax that is easy for people to read and write. Most of the textual syntax that is easy for people to read and write. Most of the
@ -144,7 +162,6 @@ A bundle of schema files is a directory tree containing `.prs` files.
Version = "version" "1" Version = "version" "1"
EmbeddedTypeName = "embeddedType" ("#f" / Ref) EmbeddedTypeName = "embeddedType" ("#f" / Ref)
Include = "include" string Include = "include" string
Definition = id "=" (OrPattern / AndPattern / Pattern)
**Version specification.** Mandatory. Names the version of the schema **Version specification.** Mandatory. Names the version of the schema
language used in the file. This version of the specification is language used in the file. This version of the specification is
@ -160,10 +177,33 @@ definition in this or a neighbouring schema, it declares that embedded
file as if it were textually inserted in place of this clause. The file as if it were textually inserted in place of this clause. The
file path may be relative to the current file, or absolute. file path may be relative to the current file, or absolute.
**Definition.** Each definition clause implicitly connects a pattern ### Definitions.
with a type name and a set of associated functions.
### Union definitions. Definition = id "=" (OrPattern / AndPattern / Pattern)
Each definition clause connects a pattern over `Value`s with a
host-language type name (derived from the supplied `id`) and set of
associated functions.
A definition may be
- an *alternation* of patterns, allowing for biased choice among alternatives;
- an *intersection* of patterns, allowing for composition and reuse of patterns; or
- the base case, an ordinary pattern.
**Host-language types.** Each definition includes *bindings* that
capture information from a parsed `Value` and expose it to programs in
the host language. When more than one binding is present in a
definition, a host-language record (product, structure, tuple) will be
the result of a parse; otherwise, a simple value will result. When a
definition involves *alternation*, a host-language representation of a
sum over the types of each branch of the alternation will result. For
example, a compiler targeting an object-oriented host language would
produce a base class for each definition, with a field for each binding
and a subclass for each variant alternative. A functional host language
with algebraic data types would produce a labelled-sum-of-products type.
### Alternation definitions.
OrPattern = AltPattern "/" AltPattern *("/" AltPattern) OrPattern = AltPattern "/" AltPattern *("/" AltPattern)
@ -172,16 +212,20 @@ The right-hand-side of a definition may supply two or more
result of the first successful alternative is the result of the entire result of the first successful alternative is the result of the entire
parse. parse.
The type corresponding to an `OrPattern` is a union type, a variant **Host-language types.** The type corresponding to an `OrPattern` is an
type, or an algebraic sum type, depending on the host language. algebraic sum type, a union type, a variant type, or a concrete subclass
of an abstract superclass, depending on the host language.
Each alternative with an `OrPattern` must have a definition-unique **Variant names.** Each alternative with an `OrPattern` must have a
*name*. The name can either be given explicitly as `@name` (see definition-unique *name*. The name is used to uniquely label the
discussion of `NamedPattern` below) or inferred. It can only be alternative's host-language representation (for example, a subclass, or
inferred from the label of a record pattern, from the name of a a member of a tagged union type).
reference to another definition, or from the text of a "sufficiently
identifierlike" literal pattern - one that matches a string, symbol, A variant name can either be given explicitly as `@name` (see discussion
number or boolean: of `NamedPattern` below) or inferred. It can only be inferred from the
label of a record pattern, from the name of a reference to another
definition, or from the text of a "sufficiently identifierlike" literal
pattern - one that matches a string, symbol, number or boolean:
AltPattern = "@" id SimplePattern AltPattern = "@" id SimplePattern
/ "<" id PatternSequence ">" / "<" id PatternSequence ">"
@ -192,30 +236,48 @@ number or boolean:
AndPattern = NamedPattern "&" NamedPattern *("&" NamedPattern) AndPattern = NamedPattern "&" NamedPattern *("&" NamedPattern)
The right-hand-side of a definition may supply two or more patterns, The right-hand-side of a definition may supply two or more patterns, the
the *intersection* of whose denotations is the denotation of the *intersection* of whose denotations is the denotation of the overall
overall definition. When parsing, every pattern is tried; if all definition. When parsing, every pattern is tried: if all succeed, the
succeed, the resulting information is combined into a single record resulting information is combined into a single type; otherwise, the
type. overall parse fails.
When serializing, the terms resulting from serializing at each pattern When serializing, the terms resulting from serializing at each pattern
are *merged* together. are *merged* together.
#### Experimental. **Host-language types.** Compiling an intersection definition produces a
host-language type that is effectively the algebraic product of the
types of the parts of the intersection. Practically, this usually means
a record (product, structure, tuple) type.
Intersections are an experimental feature. They can be used to express {:.rationale}
*optional dictionary entries*:[^not-ideal-optional-encoding] > #### Experimental.
>
MyDict = {a: int, b: string} & @c MaybeC . > Intersections are an experimental feature. They can be used to express
MaybeC = @present {c: symbol} / @invalid {c: any} / @absent {} . > *optional dictionary entries*:
>
It is not yet clear whether they pull their weight. In particular, the > MyDict = {a: int, b: string} & @c MaybeC .
semantics of serializing a value defined by intersection are not > MaybeC = @present {c: symbol} / @invalid {c: any} / @absent {} .
completely clear. >
> They can also be used to express something reminiscent of *inheritance*:
[^not-ideal-optional-encoding]: This encoding is not ideal. It passes >
responsibility for checking for invalid inputs up to the user, > Type = @base BaseFields & @detail SubType .
rather than handling it completely at the Schema layer. > BaseFields = {a: int, b: string} .
> SubType = @base {}
> / @variantA { x: int }
> / @mid Mid .
> Mid = { y: symbol } & @detail SubSubType .
> SubSubType = @variantB { z: "type-b" }
> / @variantC { z: "type-c" }
>
> It is not yet clear whether they pull their weight.
>
> From the point of view of the user of the schema language, using
> intersections to express optional values is cumbersome. Not only is it
> verbose, requiring auxiliary definitions, but it leaves responsibility
> for checking for invalid inputs up to the user, rather than handling
> it completely at the Schema layer. A future Schema version will likely
> include first-class support for optionality.
### Patterns. ### Patterns.
@ -223,13 +285,26 @@ completely clear.
Patterns come in two kinds: Patterns come in two kinds:
- the parsers for *simple patterns* yield a single host-language - The parsers for *simple patterns* yield a single host-language
value; for example, a string, an array, a pointer, and so on. value—for example, a string, an array, a number, or a pointer—or
even, in the case of `LiteralPattern`s, no host-language values at
all.[^no-values-at-all]
- the parsers for *compound patterns* yield zero or more *fields* - The parsers for *compound patterns* yield zero or more *fields*
which combine into an overall record type associated with a which combine into an overall record type associated with a
definition. definition.
[^no-values-at-all]: The case of a `LiteralPattern` yielding no
host-language values is interesting. All the information required to
reversibly store the result of a parse is already in the schema, so
nothing need be stored at runtime in host-language data type
instances. Concretely, a definition consisting only of a
`LiteralPattern` might correspond to a host-language unit type (the
empty tuple, the "void" value). Definitions consisting of
`CompoundPattern`s involving `LiteralPattern`s do not even need to
store this much: fields of unit type in a host-language record type
can simply be omitted without loss.
#### Simple patterns #### Simple patterns
SimplePattern = AnyPattern SimplePattern = AnyPattern

View File

@ -93,6 +93,16 @@ td {
padding-right: 0.5rem; padding-right: 0.5rem;
} }
blockquote {
padding: 0.5rem 1rem;
border-left: solid #4f81bd 2px;
margin-right: 0;
}
.rationale {
background-color: #e9f0f9;
}
/*---------------------------------------------------------------------------*/ /*---------------------------------------------------------------------------*/
/* Rouge syntax classes */ /* Rouge syntax classes */