Preserves stuff for the manual

This commit is contained in:
Tony Garnock-Jones 2022-02-11 22:08:59 +01:00
parent c7e2abe1fb
commit 3ef314bc21
3 changed files with 336 additions and 11 deletions

View File

@ -12,13 +12,13 @@
- [System overview](./operation/index.md)
- [The System Bus: syndicate-server](./operation/system-bus.md)
- [Configuration language](./operation/scripting.md)
- [Services and service dependencies](./operation/service.md)
- [Built-in services and service classes](./operation/builtin/index.md)
- [Gatekeeper](./operation/builtin/gatekeeper.md)
- [TCP/IP and Unix-socket Transports](./operation/builtin/relay-listener.md)
- [Configuration watcher](./operation/builtin/config-watcher.md)
- [Daemons and external programs](./operation/builtin/daemon.md)
- [Configuration scripting language](./operation/scripting.md)
- [Configuration files and directories]()
- [The boot layer]()
- [Logging]()

View File

@ -1,9 +1,334 @@
# Preserves
an S-expression-like language that is a syntactic superset of
JSON. Like JSON, Preserves is not specifically tied to any
particular programming language. Unlike JSON, Preserves has a
robust semantics, designed specifically to be a solid foundation
for networked communication.
Synit makes **extensive** use of *Preserves*, a programming-language-independent language for
data.
- [Preserves homepage](https://preserves.gitlab.io/)
- [Preserves specification](https://preserves.gitlab.io/preserves/preserves.html)
- [Preserves schema-language specification](https://preserves.gitlab.io/preserves/preserves-schema.html)
- [Source code](https://gitlab.com/preserves/preserves) for many (not all) of the implementations
- Implementations for
[Nim](https://git.sr.ht/~ehmry/preserves-nim),
[Python](https://pypi.org/project/preserves/),
[Racket](https://pkgs.racket-lang.org/package/preserves),
[Rust](https://docs.rs/preserves/latest/preserves/),
[Squeak Smalltalk](https://squeaksource.com/Preserves.html),
[TypeScript/Javascript](https://www.npmjs.com/org/preserves)
The Preserves data language is in many ways comparable to JSON, XML, S-expressions, CBOR, ASN.1
BER, and so on. From the [specification
document](https://preserves.gitlab.io/preserves/preserves.html):
> Preserves supports *records* with user-defined *labels*, embedded *references*, and the usual
> suite of atomic and compound data types, including *binary* data as a distinct type from text
> strings.
## Why does Synit rely on Preserves?
There are five aspects of Preserves that make it particularly relevant to Synit:
- the core Preserves [data language](#grammar-of-values) has a robust *semantics*;
- Preserves values may have [capability references() embedded within them;
- Preserves has a [schema language](#schemas) useful for specifying protocols among actors;
- a [canonical form](#canonical-form) exists for every Preserves value; and
- Preserves has a [query language](#preserves-path) for extracting portions of a Preserves value.
## Grammar of values
The main reason Preserves is useful for Synit is that it has *semantics*: the specification
defines a language-independent *equivalence relation* over Preserves
values.[^preserves-ordering-exists-too] This makes it a solid foundation for a multi-language,
multi-process, potentially distributed system like Synit.
[^dataspaces-need-data-with-semantics]
### Abstract syntax: Values
The *abstract syntax* of Preserves values is as follows (from the specification):
Value = Atom Atom = Boolean
| Compound | Float
| Embedded | Double
| SignedInteger
Compound = Record | String
| Sequence | ByteString
| Set | Symbol
| Dictionary
### Concrete syntax
Because Preserves has semantics independent of its syntax, we are free to define *syntax*
appropriate for its use in different settings. Values can be automatically, *losslessly*
translated from one syntax to another. The core Preserves specification defines both a
*text-based*, human-readable, JSON-like syntax, that is a syntactic superset of JSON, and a
completely equivalent compact *binary* syntax, crucial to the definition of [canonical
form](#canonical-form) for Preserves values.[^syrup]
Here are a few example values, written using the text syntax (see [the
specification](https://preserves.gitlab.io/preserves/preserves.html#textual-syntax) for the
grammar):
Boolean : #t, #f
Float : 1.0f, 10.4e3f, -100.6f
Double : 1.0, 10.4e3, -100.6
Integer : 1, 0, -100
String : "Hello, world!\n"
ByteString : #"bin\x00str\x00", #[YmluAHN0cgA], #x"62696e0073747200"
Symbol : hello-world, |hello world|, =, !, hello?, ||, ...
Record : <label field1 field2 ...>
Sequence : [value1 value2 ...]
Set : #{value1 value2 ...}
Dictionary : {key1: value1 key2: value2 ...: ...}
Embedded : #!value
Commas are optional in sequences, sets, and dictionaries.
### Canonical form
Every Preserves value can be serialized into a *canonical form* using the [binary
syntax](https://preserves.gitlab.io/preserves/preserves.html#compact-binary-syntax) along with
[a few simple rules](https://preserves.gitlab.io/preserves/canonical-binary.html) about
serialization ordering of elements in sets and keys in dictionaries.
Having a canonical form means that, for example, a SHA-512 (or other secure) digest of the
canonical serialization of a value can be used as a unique, short name for the value.
For example, the value
```preserves
<sms-delivery <address international "31653131313">
<address international "31655512345">
<rfc3339 "2022-02-09T08:18:29.88847+01:00">
"This is a test SMS message">
```
serializes canonically to
00000000: b4b3 0c73 6d73 2d64 656c 6976 6572 79b4 ...sms-delivery.
00000010: b307 6164 6472 6573 73b3 0d69 6e74 6572 ..address..inter
00000020: 6e61 7469 6f6e 616c b10b 3331 3635 3331 national..316531
00000030: 3331 3331 3384 b4b3 0761 6464 7265 7373 31313....address
00000040: b30d 696e 7465 726e 6174 696f 6e61 6cb1 ..international.
00000050: 0b33 3136 3535 3531 3233 3435 84b4 b307 .31655512345....
00000060: 7266 6333 3333 39b1 1f32 3032 322d 3032 rfc3339..2022-02
00000070: 2d30 3954 3038 3a31 383a 3239 2e38 3838 -09T08:18:29.888
00000080: 3437 2b30 313a 3030 84b1 1a54 6869 7320 47+01:00...This
00000090: 6973 2061 2074 6573 7420 534d 5320 6d65 is a test SMS me
000000a0: 7373 6167 6584 ssage.
which has SHA-512 hash
bfea9bd5ddf7781e34b6ca7e146ba2e442ef8ce04fd5ff912f889359945d0e2967a77a13
c86b13959dcce7e8ba3950d303832b825648609447b3d147677163ce
## Schemas
Preserves comes with a schema language suitable for defining protocols among actors/programs in
Synit. Because Preserves is a superset of JSON, its schemas can be used for parsing JSON just
as well as for native Preserves values. From the [schema
specification](https://preserves.gitlab.io/preserves/preserves-schema.html):
> A Preserves schema connects Preserves Values to host-language data
> structures. Each definition within a schema can be processed by a
> compiler to produce
>
> - a host-language *type definition*;
> - a partial *parsing* function from Values to instances of the
> produced type; and
> - a total *serialization* function from instances of the type to
> Values.
>
> Every parsed Value retains enough information to always be able to
> be serialized again, and every instance of a host-language data
> structure contains, by construction, enough information to be
> successfully serialized.
Instead of taking host-language data structure definitions as primary, in the way that systems
like [serde](https://serde.rs/) do, Preserves schemas take *the shape of the serialized data*
as primary.
To see the difference, let's look at an example.
### Example: Book Outline
Systems like [Serde](https://serde.rs/) concentrate on defining (de)serializers for
host-language type definitions.
Serde starts from definitions like the following[^this-example-from-mdbook]. It generates
(de)serialization code for various different *data languages* (such as JSON, XML, CBOR, etc.)
in a single *programming language*: Rust.
```rust
pub struct BookOutline {
pub sections: Vec<BookItem>,
}
pub enum BookItem {
Chapter(Chapter),
Separator,
PartTitle(String),
}
pub struct Chapter {
pub name: String,
pub sub_items: Vec<BookItem>,
}
```
The (de)serializers are able to produce and understand values such as the following JSON
document, converting them to and from in-memory representations. The focus is on Rust:
interpreting the produced documents from other languages is out-of-scope for Serde.
```json
{
"sections": [
{ "PartTitle": "Part I" },
"Separator",
{
"Chapter": {
"name": "Chapter One",
"sub_items": []
}
},
{
"Chapter": {
"name": "Chapter Two",
"sub_items": []
}
}
]
}
```
By contrast, Preserves schemas focus on Preserves values[^including-json] only.
Each Preserves schema compiler generates type definitions and (de)serialization code for a
single *programming language* able to understand common *data*. The grammar of the data itself
is language-independent.
For example, a Preserves schema able to parse values compatible with those produced by Serde
for the type definitions above is the following:
```preserves
version 1 .
BookOutline = {
"sections": @sections [BookItem ...],
} .
BookItem = @chapter { "Chapter": @value Chapter }
/ @separator "Separator"
/ @partTitle { "PartTitle": @value string } .
Chapter = {
"name": @name string,
"sub_items": @sub_items [BookItem ...],
} .
```
Using the Rust schema compiler, we see types such as the following, which are *similar to* but
not the *same* as the original Rust types above:
```rust
pub struct BookOutline {
pub sections: std::vec::Vec<BookItem>
}
pub enum BookItem {
Chapter { value: std::boxed::Box<Chapter> },
Separator,
PartTitle { value: std::string::String }
}
pub struct Chapter {
pub name: std::string::String,
pub sub_items: std::vec::Vec<BookItem>
}
```
Using the TypeScript schema compiler, we see
```typescript
export type BookOutline = {"sections": Array<BookItem>};
export type BookItem = (
{"_variant": "chapter", "value": Chapter} |
{"_variant": "separator"} |
{"_variant": "partTitle", "value": string}
);
export type Chapter = {"name": string, "sub_items": Array<BookItem>};
```
Using the Racket schema compiler, we see
```racket
(struct BookOutline (sections))
(define (BookItem? p)
(or (BookItem-chapter? p)
(BookItem-separator? p)
(BookItem-partTitle? p)))
(struct BookItem-chapter (value))
(struct BookItem-separator ())
(struct BookItem-partTitle (value))
(struct Chapter (name sub_items))
```
and so on.
### Example: Book Outline redux, using Records
The schema for book outlines above accepts Preserves (JSON) documents compatible with the
(de)serializers produced by Serde for a Rust-native type.
Instead, we might choose to define a Preserves-native data definition, and to work from
that:[^lose-compatibility]
```preserves
version 1 .
BookOutline = <book-outline @sections [BookItem ...]> .
BookItem = Chapter / =separator / @partTitle string .
Chapter = <chapter @name string @sub_items [BookItem ...]> .
```
The schema compilers produce **exactly the same** type definitions for this variation!
The differences are in the (de)serialization code only.
Here's the Preserves value equivalent to the example above, expressed using the Preserves-native schema:
```preserves
<book-outline [
"Part I"
separator
<chapter "Chapter One" []>
<chapter "Chapter Two" []>
]>
```
## Preserves Path
---
#### Notes
[^preserves-ordering-exists-too]: The specification defines a *total order relation* over
Preserves values as well.
[^dataspaces-need-data-with-semantics]: In particular, *dataspaces* need the assertion data
they contain to have a sensible equivalence predicate in order to be useful at all. If you
can't reliably tell whether two values are the same or different, how are you supposed to
use them to look things up in anything database-like?
Languages like JSON, which [don't have a well-defined equivalence
relation](https://preserves.gitlab.io/preserves/why-not-json.html#json-syntax-doesnt-mean-anything),
aren't good enough. When programs communicate with each other, they need to be sure that
their peers will understand the information they receive exactly as it was sent.
[^syrup]: Besides the two core syntaxes, other serialization syntaxes are in use in other
systems. For example, the [Spritely](https://gitlab.com/spritely)
[Goblins](https://gitlab.com/spritely/goblins) actor library uses a serialization syntax
called [Syrup](https://github.com/ocapn/syrup#pseudo-specification), reminiscent of
[`bencode`](https://en.wikipedia.org/wiki/Bencode).
[^this-example-from-mdbook]: This example is a simplified form of the preprocessor type
definitions for
[mdBook](https://rust-lang.github.io/mdBook/for_developers/preprocessors.html), the system
used to render this manual. I use a real [Preserves schema
definition](https://git.syndicate-lang.org/synit/synit/src/branch/main/manual/book.prs) for
parsing and producing Serde's JSON representation of mdBook `Book` structures in order to
[preprocess the manual's source
code](https://git.syndicate-lang.org/synit/synit/src/branch/main/manual/mdbook-ditaa).
[^including-json]: Including JSON values, of course!
[^lose-compatibility]: By doing so, we of course lose compatibility with the Serde structures.

View File

@ -11,17 +11,17 @@ It provides:
1. A **[root system bus](#the-root-system-bus)** service for use by other programs. In this way, it is
analogous to D-Bus.
2. A general-purpose **[service dependency tracking facility](./service.md)**.
2. A **[configuration language](./scripting.md)** suitable for programming
[dataspaces](../glossary.md#dataspace) with simple reactive behaviours.
3. A [**gatekeeper** service](./builtin/gatekeeper.md), for exposing
3. A general-purpose **[service dependency tracking facility](./service.md)**.
4. A [**gatekeeper** service](./builtin/gatekeeper.md), for exposing
[capabilities](../glossary.md#capability) to running objects as (potentially long-lived)
[macaroon](../glossary.md#macaroon)-style "sturdy references", plus TCP/IP- and
Unix-socket-based **[transports](./builtin/relay-listener.md)** for accessing capabilities
through the gatekeeper.
4. A limited **[configuration scripting language](./scripting.md)** suitable for
programming [dataspaces](../glossary.md#dataspace) with simple reactive behaviours.
5. An [`inotify`](https://en.wikipedia.org/wiki/Inotify)-based **[configuration
loader](./builtin/config-watcher.md)** which loads and executes configuration files written
in the scripting language.