Attempt to clarify Preserves Schema relationship to host-language types.
This commit is contained in:
parent
c49b8673f7
commit
9ba8617952
|
@ -4,7 +4,7 @@ title: "Preserves Schema"
|
|||
---
|
||||
|
||||
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
|
||||
June 2021. Version 0.1.3.
|
||||
June 2022. Version 0.2.0.
|
||||
|
||||
[abnf]: https://tools.ietf.org/html/rfc7405
|
||||
|
||||
|
@ -17,7 +17,7 @@ A Preserves schema connects Preserves `Value`s to host-language data
|
|||
structures. Each definition within a schema can be processed by a
|
||||
compiler to produce
|
||||
|
||||
- a host-language *type definition*;
|
||||
- a simple host-language *type definition*;
|
||||
|
||||
- a partial *parsing* function from `Value`s to instances of the
|
||||
produced type; and
|
||||
|
@ -30,6 +30,24 @@ be serialized again, and every instance of a host-language data
|
|||
structure contains, by construction, enough information to be
|
||||
successfully serialized.
|
||||
|
||||
**Portability.** Preserves Schema is broadly portable. Any host-language
|
||||
type system that can represent [algebraic
|
||||
types](https://en.wikipedia.org/wiki/Algebraic_data_type) in some way
|
||||
should be suitable as a compilation target.
|
||||
|
||||
This includes ML-family languages like [Rust][rust-impl] and Haskell,
|
||||
object-oriented languages like Java, [Python][python-impl] and
|
||||
[Smalltalk][smalltalk-impl], and multiparadigm languages like
|
||||
[JavaScript][ts-impl], [TypeScript][ts-impl], [Racket][racket-impl],
|
||||
[Nim][nim-impl] and Erlang.
|
||||
|
||||
[nim-impl]: https://git.syndicate-lang.org/ehmry/preserves-nim
|
||||
[python-impl]: https://gitlab.com/preserves/preserves/-/blob/main/implementations/python/preserves/schema.py
|
||||
[racket-impl]: https://gitlab.com/preserves/preserves/-/tree/main/implementations/racket/preserves/preserves-schema
|
||||
[rust-impl]: https://gitlab.com/preserves/preserves/-/tree/main/implementations/rust/preserves-schema
|
||||
[smalltalk-impl]: https://squeaksource.com/Preserves.html
|
||||
[ts-impl]: https://gitlab.com/preserves/preserves/-/tree/main/implementations/javascript/packages/schema
|
||||
|
||||
**Example.** Sending the schema
|
||||
|
||||
version 1 .
|
||||
|
@ -111,13 +129,13 @@ conventions:
|
|||
- `UpperCamelCase` for *definition* names.
|
||||
|
||||
- Either `lowerCamelCase` or `UpperCamelCase` for definition-unique
|
||||
names for alternatives within a union definition.
|
||||
names for alternatives within an alternation definition.
|
||||
|
||||
- `lowerCamelCase` for *module* names (schema names, package names)
|
||||
and *field* or *variable* names.
|
||||
|
||||
|
||||
## Concrete (DSL) Syntax
|
||||
## The Preserves Schema Language
|
||||
|
||||
In this section, we use an [ABNF][abnf]-like notation to define a
|
||||
textual syntax that is easy for people to read and write. Most of the
|
||||
|
@ -144,7 +162,6 @@ A bundle of schema files is a directory tree containing `.prs` files.
|
|||
Version = "version" "1"
|
||||
EmbeddedTypeName = "embeddedType" ("#f" / Ref)
|
||||
Include = "include" string
|
||||
Definition = id "=" (OrPattern / AndPattern / Pattern)
|
||||
|
||||
**Version specification.** Mandatory. Names the version of the schema
|
||||
language used in the file. This version of the specification is
|
||||
|
@ -160,10 +177,33 @@ definition in this or a neighbouring schema, it declares that embedded
|
|||
file as if it were textually inserted in place of this clause. The
|
||||
file path may be relative to the current file, or absolute.
|
||||
|
||||
**Definition.** Each definition clause implicitly connects a pattern
|
||||
with a type name and a set of associated functions.
|
||||
### Definitions.
|
||||
|
||||
### Union definitions.
|
||||
Definition = id "=" (OrPattern / AndPattern / Pattern)
|
||||
|
||||
Each definition clause connects a pattern over `Value`s with a
|
||||
host-language type name (derived from the supplied `id`) and set of
|
||||
associated functions.
|
||||
|
||||
A definition may be
|
||||
|
||||
- an *alternation* of patterns, allowing for biased choice among alternatives;
|
||||
- an *intersection* of patterns, allowing for composition and reuse of patterns; or
|
||||
- the base case, an ordinary pattern.
|
||||
|
||||
**Host-language types.** Each definition includes *bindings* that
|
||||
capture information from a parsed `Value` and expose it to programs in
|
||||
the host language. When more than one binding is present in a
|
||||
definition, a host-language record (product, structure, tuple) will be
|
||||
the result of a parse; otherwise, a simple value will result. When a
|
||||
definition involves *alternation*, a host-language representation of a
|
||||
sum over the types of each branch of the alternation will result. For
|
||||
example, a compiler targeting an object-oriented host language would
|
||||
produce a base class for each definition, with a field for each binding
|
||||
and a subclass for each variant alternative. A functional host language
|
||||
with algebraic data types would produce a labelled-sum-of-products type.
|
||||
|
||||
### Alternation definitions.
|
||||
|
||||
OrPattern = AltPattern "/" AltPattern *("/" AltPattern)
|
||||
|
||||
|
@ -172,16 +212,20 @@ The right-hand-side of a definition may supply two or more
|
|||
result of the first successful alternative is the result of the entire
|
||||
parse.
|
||||
|
||||
The type corresponding to an `OrPattern` is a union type, a variant
|
||||
type, or an algebraic sum type, depending on the host language.
|
||||
**Host-language types.** The type corresponding to an `OrPattern` is an
|
||||
algebraic sum type, a union type, a variant type, or a concrete subclass
|
||||
of an abstract superclass, depending on the host language.
|
||||
|
||||
Each alternative with an `OrPattern` must have a definition-unique
|
||||
*name*. The name can either be given explicitly as `@name` (see
|
||||
discussion of `NamedPattern` below) or inferred. It can only be
|
||||
inferred from the label of a record pattern, from the name of a
|
||||
reference to another definition, or from the text of a "sufficiently
|
||||
identifierlike" literal pattern - one that matches a string, symbol,
|
||||
number or boolean:
|
||||
**Variant names.** Each alternative with an `OrPattern` must have a
|
||||
definition-unique *name*. The name is used to uniquely label the
|
||||
alternative's host-language representation (for example, a subclass, or
|
||||
a member of a tagged union type).
|
||||
|
||||
A variant name can either be given explicitly as `@name` (see discussion
|
||||
of `NamedPattern` below) or inferred. It can only be inferred from the
|
||||
label of a record pattern, from the name of a reference to another
|
||||
definition, or from the text of a "sufficiently identifierlike" literal
|
||||
pattern - one that matches a string, symbol, number or boolean:
|
||||
|
||||
AltPattern = "@" id SimplePattern
|
||||
/ "<" id PatternSequence ">"
|
||||
|
@ -192,30 +236,48 @@ number or boolean:
|
|||
|
||||
AndPattern = NamedPattern "&" NamedPattern *("&" NamedPattern)
|
||||
|
||||
The right-hand-side of a definition may supply two or more patterns,
|
||||
the *intersection* of whose denotations is the denotation of the
|
||||
overall definition. When parsing, every pattern is tried; if all
|
||||
succeed, the resulting information is combined into a single record
|
||||
type.
|
||||
The right-hand-side of a definition may supply two or more patterns, the
|
||||
*intersection* of whose denotations is the denotation of the overall
|
||||
definition. When parsing, every pattern is tried: if all succeed, the
|
||||
resulting information is combined into a single type; otherwise, the
|
||||
overall parse fails.
|
||||
|
||||
When serializing, the terms resulting from serializing at each pattern
|
||||
are *merged* together.
|
||||
|
||||
#### Experimental.
|
||||
**Host-language types.** Compiling an intersection definition produces a
|
||||
host-language type that is effectively the algebraic product of the
|
||||
types of the parts of the intersection. Practically, this usually means
|
||||
a record (product, structure, tuple) type.
|
||||
|
||||
Intersections are an experimental feature. They can be used to express
|
||||
*optional dictionary entries*:[^not-ideal-optional-encoding]
|
||||
|
||||
MyDict = {a: int, b: string} & @c MaybeC .
|
||||
MaybeC = @present {c: symbol} / @invalid {c: any} / @absent {} .
|
||||
|
||||
It is not yet clear whether they pull their weight. In particular, the
|
||||
semantics of serializing a value defined by intersection are not
|
||||
completely clear.
|
||||
|
||||
[^not-ideal-optional-encoding]: This encoding is not ideal. It passes
|
||||
responsibility for checking for invalid inputs up to the user,
|
||||
rather than handling it completely at the Schema layer.
|
||||
{:.rationale}
|
||||
> #### Experimental.
|
||||
>
|
||||
> Intersections are an experimental feature. They can be used to express
|
||||
> *optional dictionary entries*:
|
||||
>
|
||||
> MyDict = {a: int, b: string} & @c MaybeC .
|
||||
> MaybeC = @present {c: symbol} / @invalid {c: any} / @absent {} .
|
||||
>
|
||||
> They can also be used to express something reminiscent of *inheritance*:
|
||||
>
|
||||
> Type = @base BaseFields & @detail SubType .
|
||||
> BaseFields = {a: int, b: string} .
|
||||
> SubType = @base {}
|
||||
> / @variantA { x: int }
|
||||
> / @mid Mid .
|
||||
> Mid = { y: symbol } & @detail SubSubType .
|
||||
> SubSubType = @variantB { z: "type-b" }
|
||||
> / @variantC { z: "type-c" }
|
||||
>
|
||||
> It is not yet clear whether they pull their weight.
|
||||
>
|
||||
> From the point of view of the user of the schema language, using
|
||||
> intersections to express optional values is cumbersome. Not only is it
|
||||
> verbose, requiring auxiliary definitions, but it leaves responsibility
|
||||
> for checking for invalid inputs up to the user, rather than handling
|
||||
> it completely at the Schema layer. A future Schema version will likely
|
||||
> include first-class support for optionality.
|
||||
|
||||
### Patterns.
|
||||
|
||||
|
@ -223,13 +285,26 @@ completely clear.
|
|||
|
||||
Patterns come in two kinds:
|
||||
|
||||
- the parsers for *simple patterns* yield a single host-language
|
||||
value; for example, a string, an array, a pointer, and so on.
|
||||
- The parsers for *simple patterns* yield a single host-language
|
||||
value—for example, a string, an array, a number, or a pointer—or
|
||||
even, in the case of `LiteralPattern`s, no host-language values at
|
||||
all.[^no-values-at-all]
|
||||
|
||||
- the parsers for *compound patterns* yield zero or more *fields*
|
||||
- The parsers for *compound patterns* yield zero or more *fields*
|
||||
which combine into an overall record type associated with a
|
||||
definition.
|
||||
|
||||
[^no-values-at-all]: The case of a `LiteralPattern` yielding no
|
||||
host-language values is interesting. All the information required to
|
||||
reversibly store the result of a parse is already in the schema, so
|
||||
nothing need be stored at runtime in host-language data type
|
||||
instances. Concretely, a definition consisting only of a
|
||||
`LiteralPattern` might correspond to a host-language unit type (the
|
||||
empty tuple, the "void" value). Definitions consisting of
|
||||
`CompoundPattern`s involving `LiteralPattern`s do not even need to
|
||||
store this much: fields of unit type in a host-language record type
|
||||
can simply be omitted without loss.
|
||||
|
||||
#### Simple patterns
|
||||
|
||||
SimplePattern = AnyPattern
|
||||
|
|
|
@ -93,6 +93,16 @@ td {
|
|||
padding-right: 0.5rem;
|
||||
}
|
||||
|
||||
blockquote {
|
||||
padding: 0.5rem 1rem;
|
||||
border-left: solid #4f81bd 2px;
|
||||
margin-right: 0;
|
||||
}
|
||||
|
||||
.rationale {
|
||||
background-color: #e9f0f9;
|
||||
}
|
||||
|
||||
/*---------------------------------------------------------------------------*/
|
||||
/* Rouge syntax classes */
|
||||
|
||||
|
|
Loading…
Reference in New Issue