diff --git a/preserves-schema.md b/preserves-schema.md index 5f7d8cc..569b62c 100644 --- a/preserves-schema.md +++ b/preserves-schema.md @@ -4,7 +4,7 @@ title: "Preserves Schema" --- Tony Garnock-Jones -June 2021. Version 0.1.3. +June 2022. Version 0.2.0. [abnf]: https://tools.ietf.org/html/rfc7405 @@ -17,7 +17,7 @@ A Preserves schema connects Preserves `Value`s to host-language data structures. Each definition within a schema can be processed by a compiler to produce - - a host-language *type definition*; + - a simple host-language *type definition*; - a partial *parsing* function from `Value`s to instances of the produced type; and @@ -30,6 +30,24 @@ be serialized again, and every instance of a host-language data structure contains, by construction, enough information to be successfully serialized. +**Portability.** Preserves Schema is broadly portable. Any host-language +type system that can represent [algebraic +types](https://en.wikipedia.org/wiki/Algebraic_data_type) in some way +should be suitable as a compilation target. + +This includes ML-family languages like [Rust][rust-impl] and Haskell, +object-oriented languages like Java, [Python][python-impl] and +[Smalltalk][smalltalk-impl], and multiparadigm languages like +[JavaScript][ts-impl], [TypeScript][ts-impl], [Racket][racket-impl], +[Nim][nim-impl] and Erlang. + +[nim-impl]: https://git.syndicate-lang.org/ehmry/preserves-nim +[python-impl]: https://gitlab.com/preserves/preserves/-/blob/main/implementations/python/preserves/schema.py +[racket-impl]: https://gitlab.com/preserves/preserves/-/tree/main/implementations/racket/preserves/preserves-schema +[rust-impl]: https://gitlab.com/preserves/preserves/-/tree/main/implementations/rust/preserves-schema +[smalltalk-impl]: https://squeaksource.com/Preserves.html +[ts-impl]: https://gitlab.com/preserves/preserves/-/tree/main/implementations/javascript/packages/schema + **Example.** Sending the schema version 1 . @@ -111,13 +129,13 @@ conventions: - `UpperCamelCase` for *definition* names. - Either `lowerCamelCase` or `UpperCamelCase` for definition-unique - names for alternatives within a union definition. + names for alternatives within an alternation definition. - `lowerCamelCase` for *module* names (schema names, package names) and *field* or *variable* names. -## Concrete (DSL) Syntax +## The Preserves Schema Language In this section, we use an [ABNF][abnf]-like notation to define a textual syntax that is easy for people to read and write. Most of the @@ -144,7 +162,6 @@ A bundle of schema files is a directory tree containing `.prs` files. Version = "version" "1" EmbeddedTypeName = "embeddedType" ("#f" / Ref) Include = "include" string - Definition = id "=" (OrPattern / AndPattern / Pattern) **Version specification.** Mandatory. Names the version of the schema language used in the file. This version of the specification is @@ -160,10 +177,33 @@ definition in this or a neighbouring schema, it declares that embedded file as if it were textually inserted in place of this clause. The file path may be relative to the current file, or absolute. -**Definition.** Each definition clause implicitly connects a pattern -with a type name and a set of associated functions. +### Definitions. -### Union definitions. + Definition = id "=" (OrPattern / AndPattern / Pattern) + +Each definition clause connects a pattern over `Value`s with a +host-language type name (derived from the supplied `id`) and set of +associated functions. + +A definition may be + + - an *alternation* of patterns, allowing for biased choice among alternatives; + - an *intersection* of patterns, allowing for composition and reuse of patterns; or + - the base case, an ordinary pattern. + +**Host-language types.** Each definition includes *bindings* that +capture information from a parsed `Value` and expose it to programs in +the host language. When more than one binding is present in a +definition, a host-language record (product, structure, tuple) will be +the result of a parse; otherwise, a simple value will result. When a +definition involves *alternation*, a host-language representation of a +sum over the types of each branch of the alternation will result. For +example, a compiler targeting an object-oriented host language would +produce a base class for each definition, with a field for each binding +and a subclass for each variant alternative. A functional host language +with algebraic data types would produce a labelled-sum-of-products type. + +### Alternation definitions. OrPattern = AltPattern "/" AltPattern *("/" AltPattern) @@ -172,16 +212,20 @@ The right-hand-side of a definition may supply two or more result of the first successful alternative is the result of the entire parse. -The type corresponding to an `OrPattern` is a union type, a variant -type, or an algebraic sum type, depending on the host language. +**Host-language types.** The type corresponding to an `OrPattern` is an +algebraic sum type, a union type, a variant type, or a concrete subclass +of an abstract superclass, depending on the host language. -Each alternative with an `OrPattern` must have a definition-unique -*name*. The name can either be given explicitly as `@name` (see -discussion of `NamedPattern` below) or inferred. It can only be -inferred from the label of a record pattern, from the name of a -reference to another definition, or from the text of a "sufficiently -identifierlike" literal pattern - one that matches a string, symbol, -number or boolean: +**Variant names.** Each alternative with an `OrPattern` must have a +definition-unique *name*. The name is used to uniquely label the +alternative's host-language representation (for example, a subclass, or +a member of a tagged union type). + +A variant name can either be given explicitly as `@name` (see discussion +of `NamedPattern` below) or inferred. It can only be inferred from the +label of a record pattern, from the name of a reference to another +definition, or from the text of a "sufficiently identifierlike" literal +pattern - one that matches a string, symbol, number or boolean: AltPattern = "@" id SimplePattern / "<" id PatternSequence ">" @@ -192,30 +236,48 @@ number or boolean: AndPattern = NamedPattern "&" NamedPattern *("&" NamedPattern) -The right-hand-side of a definition may supply two or more patterns, -the *intersection* of whose denotations is the denotation of the -overall definition. When parsing, every pattern is tried; if all -succeed, the resulting information is combined into a single record -type. +The right-hand-side of a definition may supply two or more patterns, the +*intersection* of whose denotations is the denotation of the overall +definition. When parsing, every pattern is tried: if all succeed, the +resulting information is combined into a single type; otherwise, the +overall parse fails. When serializing, the terms resulting from serializing at each pattern are *merged* together. -#### Experimental. +**Host-language types.** Compiling an intersection definition produces a +host-language type that is effectively the algebraic product of the +types of the parts of the intersection. Practically, this usually means +a record (product, structure, tuple) type. -Intersections are an experimental feature. They can be used to express -*optional dictionary entries*:[^not-ideal-optional-encoding] - - MyDict = {a: int, b: string} & @c MaybeC . - MaybeC = @present {c: symbol} / @invalid {c: any} / @absent {} . - -It is not yet clear whether they pull their weight. In particular, the -semantics of serializing a value defined by intersection are not -completely clear. - -[^not-ideal-optional-encoding]: This encoding is not ideal. It passes - responsibility for checking for invalid inputs up to the user, - rather than handling it completely at the Schema layer. +{:.rationale} +> #### Experimental. +> +> Intersections are an experimental feature. They can be used to express +> *optional dictionary entries*: +> +> MyDict = {a: int, b: string} & @c MaybeC . +> MaybeC = @present {c: symbol} / @invalid {c: any} / @absent {} . +> +> They can also be used to express something reminiscent of *inheritance*: +> +> Type = @base BaseFields & @detail SubType . +> BaseFields = {a: int, b: string} . +> SubType = @base {} +> / @variantA { x: int } +> / @mid Mid . +> Mid = { y: symbol } & @detail SubSubType . +> SubSubType = @variantB { z: "type-b" } +> / @variantC { z: "type-c" } +> +> It is not yet clear whether they pull their weight. +> +> From the point of view of the user of the schema language, using +> intersections to express optional values is cumbersome. Not only is it +> verbose, requiring auxiliary definitions, but it leaves responsibility +> for checking for invalid inputs up to the user, rather than handling +> it completely at the Schema layer. A future Schema version will likely +> include first-class support for optionality. ### Patterns. @@ -223,13 +285,26 @@ completely clear. Patterns come in two kinds: - - the parsers for *simple patterns* yield a single host-language - value; for example, a string, an array, a pointer, and so on. + - The parsers for *simple patterns* yield a single host-language + value—for example, a string, an array, a number, or a pointer—or + even, in the case of `LiteralPattern`s, no host-language values at + all.[^no-values-at-all] - - the parsers for *compound patterns* yield zero or more *fields* + - The parsers for *compound patterns* yield zero or more *fields* which combine into an overall record type associated with a definition. +[^no-values-at-all]: The case of a `LiteralPattern` yielding no + host-language values is interesting. All the information required to + reversibly store the result of a parse is already in the schema, so + nothing need be stored at runtime in host-language data type + instances. Concretely, a definition consisting only of a + `LiteralPattern` might correspond to a host-language unit type (the + empty tuple, the "void" value). Definitions consisting of + `CompoundPattern`s involving `LiteralPattern`s do not even need to + store this much: fields of unit type in a host-language record type + can simply be omitted without loss. + #### Simple patterns SimplePattern = AnyPattern diff --git a/preserves.css b/preserves.css index cef042b..4e3bcb2 100644 --- a/preserves.css +++ b/preserves.css @@ -93,6 +93,16 @@ td { padding-right: 0.5rem; } +blockquote { + padding: 0.5rem 1rem; + border-left: solid #4f81bd 2px; + margin-right: 0; +} + +.rationale { + background-color: #e9f0f9; +} + /*---------------------------------------------------------------------------*/ /* Rouge syntax classes */