Preserves stuff for the manual
This commit is contained in:
parent
c7e2abe1fb
commit
3ef314bc21
|
@ -12,13 +12,13 @@
|
|||
|
||||
- [System overview](./operation/index.md)
|
||||
- [The System Bus: syndicate-server](./operation/system-bus.md)
|
||||
- [Configuration language](./operation/scripting.md)
|
||||
- [Services and service dependencies](./operation/service.md)
|
||||
- [Built-in services and service classes](./operation/builtin/index.md)
|
||||
- [Gatekeeper](./operation/builtin/gatekeeper.md)
|
||||
- [TCP/IP and Unix-socket Transports](./operation/builtin/relay-listener.md)
|
||||
- [Configuration watcher](./operation/builtin/config-watcher.md)
|
||||
- [Daemons and external programs](./operation/builtin/daemon.md)
|
||||
- [Configuration scripting language](./operation/scripting.md)
|
||||
- [Configuration files and directories]()
|
||||
- [The boot layer]()
|
||||
- [Logging]()
|
||||
|
|
|
@ -1,9 +1,334 @@
|
|||
# Preserves
|
||||
|
||||
an S-expression-like language that is a syntactic superset of
|
||||
JSON. Like JSON, Preserves is not specifically tied to any
|
||||
particular programming language. Unlike JSON, Preserves has a
|
||||
robust semantics, designed specifically to be a solid foundation
|
||||
for networked communication.
|
||||
Synit makes **extensive** use of *Preserves*, a programming-language-independent language for
|
||||
data.
|
||||
|
||||
- [Preserves homepage](https://preserves.gitlab.io/)
|
||||
- [Preserves specification](https://preserves.gitlab.io/preserves/preserves.html)
|
||||
- [Preserves schema-language specification](https://preserves.gitlab.io/preserves/preserves-schema.html)
|
||||
- [Source code](https://gitlab.com/preserves/preserves) for many (not all) of the implementations
|
||||
- Implementations for
|
||||
[Nim](https://git.sr.ht/~ehmry/preserves-nim),
|
||||
[Python](https://pypi.org/project/preserves/),
|
||||
[Racket](https://pkgs.racket-lang.org/package/preserves),
|
||||
[Rust](https://docs.rs/preserves/latest/preserves/),
|
||||
[Squeak Smalltalk](https://squeaksource.com/Preserves.html),
|
||||
[TypeScript/Javascript](https://www.npmjs.com/org/preserves)
|
||||
|
||||
The Preserves data language is in many ways comparable to JSON, XML, S-expressions, CBOR, ASN.1
|
||||
BER, and so on. From the [specification
|
||||
document](https://preserves.gitlab.io/preserves/preserves.html):
|
||||
|
||||
> Preserves supports *records* with user-defined *labels*, embedded *references*, and the usual
|
||||
> suite of atomic and compound data types, including *binary* data as a distinct type from text
|
||||
> strings.
|
||||
|
||||
## Why does Synit rely on Preserves?
|
||||
|
||||
There are five aspects of Preserves that make it particularly relevant to Synit:
|
||||
|
||||
- the core Preserves [data language](#grammar-of-values) has a robust *semantics*;
|
||||
- Preserves values may have [capability references() embedded within them;
|
||||
- Preserves has a [schema language](#schemas) useful for specifying protocols among actors;
|
||||
- a [canonical form](#canonical-form) exists for every Preserves value; and
|
||||
- Preserves has a [query language](#preserves-path) for extracting portions of a Preserves value.
|
||||
|
||||
## Grammar of values
|
||||
|
||||
The main reason Preserves is useful for Synit is that it has *semantics*: the specification
|
||||
defines a language-independent *equivalence relation* over Preserves
|
||||
values.[^preserves-ordering-exists-too] This makes it a solid foundation for a multi-language,
|
||||
multi-process, potentially distributed system like Synit.
|
||||
[^dataspaces-need-data-with-semantics]
|
||||
|
||||
### Abstract syntax: Values
|
||||
|
||||
The *abstract syntax* of Preserves values is as follows (from the specification):
|
||||
|
||||
Value = Atom Atom = Boolean
|
||||
| Compound | Float
|
||||
| Embedded | Double
|
||||
| SignedInteger
|
||||
Compound = Record | String
|
||||
| Sequence | ByteString
|
||||
| Set | Symbol
|
||||
| Dictionary
|
||||
|
||||
### Concrete syntax
|
||||
|
||||
Because Preserves has semantics independent of its syntax, we are free to define *syntax*
|
||||
appropriate for its use in different settings. Values can be automatically, *losslessly*
|
||||
translated from one syntax to another. The core Preserves specification defines both a
|
||||
*text-based*, human-readable, JSON-like syntax, that is a syntactic superset of JSON, and a
|
||||
completely equivalent compact *binary* syntax, crucial to the definition of [canonical
|
||||
form](#canonical-form) for Preserves values.[^syrup]
|
||||
|
||||
Here are a few example values, written using the text syntax (see [the
|
||||
specification](https://preserves.gitlab.io/preserves/preserves.html#textual-syntax) for the
|
||||
grammar):
|
||||
|
||||
Boolean : #t, #f
|
||||
Float : 1.0f, 10.4e3f, -100.6f
|
||||
Double : 1.0, 10.4e3, -100.6
|
||||
Integer : 1, 0, -100
|
||||
String : "Hello, world!\n"
|
||||
ByteString : #"bin\x00str\x00", #[YmluAHN0cgA], #x"62696e0073747200"
|
||||
Symbol : hello-world, |hello world|, =, !, hello?, ||, ...
|
||||
Record : <label field1 field2 ...>
|
||||
Sequence : [value1 value2 ...]
|
||||
Set : #{value1 value2 ...}
|
||||
Dictionary : {key1: value1 key2: value2 ...: ...}
|
||||
Embedded : #!value
|
||||
|
||||
Commas are optional in sequences, sets, and dictionaries.
|
||||
|
||||
### Canonical form
|
||||
|
||||
Every Preserves value can be serialized into a *canonical form* using the [binary
|
||||
syntax](https://preserves.gitlab.io/preserves/preserves.html#compact-binary-syntax) along with
|
||||
[a few simple rules](https://preserves.gitlab.io/preserves/canonical-binary.html) about
|
||||
serialization ordering of elements in sets and keys in dictionaries.
|
||||
|
||||
Having a canonical form means that, for example, a SHA-512 (or other secure) digest of the
|
||||
canonical serialization of a value can be used as a unique, short name for the value.
|
||||
|
||||
For example, the value
|
||||
|
||||
```preserves
|
||||
<sms-delivery <address international "31653131313">
|
||||
<address international "31655512345">
|
||||
<rfc3339 "2022-02-09T08:18:29.88847+01:00">
|
||||
"This is a test SMS message">
|
||||
```
|
||||
|
||||
serializes canonically to
|
||||
|
||||
00000000: b4b3 0c73 6d73 2d64 656c 6976 6572 79b4 ...sms-delivery.
|
||||
00000010: b307 6164 6472 6573 73b3 0d69 6e74 6572 ..address..inter
|
||||
00000020: 6e61 7469 6f6e 616c b10b 3331 3635 3331 national..316531
|
||||
00000030: 3331 3331 3384 b4b3 0761 6464 7265 7373 31313....address
|
||||
00000040: b30d 696e 7465 726e 6174 696f 6e61 6cb1 ..international.
|
||||
00000050: 0b33 3136 3535 3531 3233 3435 84b4 b307 .31655512345....
|
||||
00000060: 7266 6333 3333 39b1 1f32 3032 322d 3032 rfc3339..2022-02
|
||||
00000070: 2d30 3954 3038 3a31 383a 3239 2e38 3838 -09T08:18:29.888
|
||||
00000080: 3437 2b30 313a 3030 84b1 1a54 6869 7320 47+01:00...This
|
||||
00000090: 6973 2061 2074 6573 7420 534d 5320 6d65 is a test SMS me
|
||||
000000a0: 7373 6167 6584 ssage.
|
||||
|
||||
which has SHA-512 hash
|
||||
|
||||
bfea9bd5ddf7781e34b6ca7e146ba2e442ef8ce04fd5ff912f889359945d0e2967a77a13
|
||||
c86b13959dcce7e8ba3950d303832b825648609447b3d147677163ce
|
||||
|
||||
## Schemas
|
||||
|
||||
Preserves comes with a schema language suitable for defining protocols among actors/programs in
|
||||
Synit. Because Preserves is a superset of JSON, its schemas can be used for parsing JSON just
|
||||
as well as for native Preserves values. From the [schema
|
||||
specification](https://preserves.gitlab.io/preserves/preserves-schema.html):
|
||||
|
||||
> A Preserves schema connects Preserves Values to host-language data
|
||||
> structures. Each definition within a schema can be processed by a
|
||||
> compiler to produce
|
||||
>
|
||||
> - a host-language *type definition*;
|
||||
> - a partial *parsing* function from Values to instances of the
|
||||
> produced type; and
|
||||
> - a total *serialization* function from instances of the type to
|
||||
> Values.
|
||||
>
|
||||
> Every parsed Value retains enough information to always be able to
|
||||
> be serialized again, and every instance of a host-language data
|
||||
> structure contains, by construction, enough information to be
|
||||
> successfully serialized.
|
||||
|
||||
Instead of taking host-language data structure definitions as primary, in the way that systems
|
||||
like [serde](https://serde.rs/) do, Preserves schemas take *the shape of the serialized data*
|
||||
as primary.
|
||||
|
||||
To see the difference, let's look at an example.
|
||||
|
||||
### Example: Book Outline
|
||||
|
||||
Systems like [Serde](https://serde.rs/) concentrate on defining (de)serializers for
|
||||
host-language type definitions.
|
||||
|
||||
Serde starts from definitions like the following[^this-example-from-mdbook]. It generates
|
||||
(de)serialization code for various different *data languages* (such as JSON, XML, CBOR, etc.)
|
||||
in a single *programming language*: Rust.
|
||||
|
||||
```rust
|
||||
pub struct BookOutline {
|
||||
pub sections: Vec<BookItem>,
|
||||
}
|
||||
pub enum BookItem {
|
||||
Chapter(Chapter),
|
||||
Separator,
|
||||
PartTitle(String),
|
||||
}
|
||||
pub struct Chapter {
|
||||
pub name: String,
|
||||
pub sub_items: Vec<BookItem>,
|
||||
}
|
||||
```
|
||||
|
||||
The (de)serializers are able to produce and understand values such as the following JSON
|
||||
document, converting them to and from in-memory representations. The focus is on Rust:
|
||||
interpreting the produced documents from other languages is out-of-scope for Serde.
|
||||
|
||||
```json
|
||||
{
|
||||
"sections": [
|
||||
{ "PartTitle": "Part I" },
|
||||
"Separator",
|
||||
{
|
||||
"Chapter": {
|
||||
"name": "Chapter One",
|
||||
"sub_items": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"Chapter": {
|
||||
"name": "Chapter Two",
|
||||
"sub_items": []
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
By contrast, Preserves schemas focus on Preserves values[^including-json] only.
|
||||
|
||||
Each Preserves schema compiler generates type definitions and (de)serialization code for a
|
||||
single *programming language* able to understand common *data*. The grammar of the data itself
|
||||
is language-independent.
|
||||
|
||||
For example, a Preserves schema able to parse values compatible with those produced by Serde
|
||||
for the type definitions above is the following:
|
||||
|
||||
```preserves
|
||||
version 1 .
|
||||
BookOutline = {
|
||||
"sections": @sections [BookItem ...],
|
||||
} .
|
||||
BookItem = @chapter { "Chapter": @value Chapter }
|
||||
/ @separator "Separator"
|
||||
/ @partTitle { "PartTitle": @value string } .
|
||||
Chapter = {
|
||||
"name": @name string,
|
||||
"sub_items": @sub_items [BookItem ...],
|
||||
} .
|
||||
```
|
||||
|
||||
Using the Rust schema compiler, we see types such as the following, which are *similar to* but
|
||||
not the *same* as the original Rust types above:
|
||||
|
||||
```rust
|
||||
pub struct BookOutline {
|
||||
pub sections: std::vec::Vec<BookItem>
|
||||
}
|
||||
pub enum BookItem {
|
||||
Chapter { value: std::boxed::Box<Chapter> },
|
||||
Separator,
|
||||
PartTitle { value: std::string::String }
|
||||
}
|
||||
pub struct Chapter {
|
||||
pub name: std::string::String,
|
||||
pub sub_items: std::vec::Vec<BookItem>
|
||||
}
|
||||
```
|
||||
|
||||
Using the TypeScript schema compiler, we see
|
||||
|
||||
```typescript
|
||||
export type BookOutline = {"sections": Array<BookItem>};
|
||||
export type BookItem = (
|
||||
{"_variant": "chapter", "value": Chapter} |
|
||||
{"_variant": "separator"} |
|
||||
{"_variant": "partTitle", "value": string}
|
||||
);
|
||||
export type Chapter = {"name": string, "sub_items": Array<BookItem>};
|
||||
```
|
||||
|
||||
Using the Racket schema compiler, we see
|
||||
|
||||
```racket
|
||||
(struct BookOutline (sections))
|
||||
(define (BookItem? p)
|
||||
(or (BookItem-chapter? p)
|
||||
(BookItem-separator? p)
|
||||
(BookItem-partTitle? p)))
|
||||
(struct BookItem-chapter (value))
|
||||
(struct BookItem-separator ())
|
||||
(struct BookItem-partTitle (value))
|
||||
(struct Chapter (name sub_items))
|
||||
```
|
||||
|
||||
and so on.
|
||||
|
||||
### Example: Book Outline redux, using Records
|
||||
|
||||
The schema for book outlines above accepts Preserves (JSON) documents compatible with the
|
||||
(de)serializers produced by Serde for a Rust-native type.
|
||||
|
||||
Instead, we might choose to define a Preserves-native data definition, and to work from
|
||||
that:[^lose-compatibility]
|
||||
|
||||
```preserves
|
||||
version 1 .
|
||||
BookOutline = <book-outline @sections [BookItem ...]> .
|
||||
BookItem = Chapter / =separator / @partTitle string .
|
||||
Chapter = <chapter @name string @sub_items [BookItem ...]> .
|
||||
```
|
||||
|
||||
The schema compilers produce **exactly the same** type definitions for this variation!
|
||||
|
||||
The differences are in the (de)serialization code only.
|
||||
|
||||
Here's the Preserves value equivalent to the example above, expressed using the Preserves-native schema:
|
||||
|
||||
```preserves
|
||||
<book-outline [
|
||||
"Part I"
|
||||
separator
|
||||
<chapter "Chapter One" []>
|
||||
<chapter "Chapter Two" []>
|
||||
]>
|
||||
```
|
||||
|
||||
## Preserves Path
|
||||
|
||||
---
|
||||
|
||||
#### Notes
|
||||
|
||||
[^preserves-ordering-exists-too]: The specification defines a *total order relation* over
|
||||
Preserves values as well.
|
||||
|
||||
[^dataspaces-need-data-with-semantics]: In particular, *dataspaces* need the assertion data
|
||||
they contain to have a sensible equivalence predicate in order to be useful at all. If you
|
||||
can't reliably tell whether two values are the same or different, how are you supposed to
|
||||
use them to look things up in anything database-like?
|
||||
Languages like JSON, which [don't have a well-defined equivalence
|
||||
relation](https://preserves.gitlab.io/preserves/why-not-json.html#json-syntax-doesnt-mean-anything),
|
||||
aren't good enough. When programs communicate with each other, they need to be sure that
|
||||
their peers will understand the information they receive exactly as it was sent.
|
||||
|
||||
[^syrup]: Besides the two core syntaxes, other serialization syntaxes are in use in other
|
||||
systems. For example, the [Spritely](https://gitlab.com/spritely)
|
||||
[Goblins](https://gitlab.com/spritely/goblins) actor library uses a serialization syntax
|
||||
called [Syrup](https://github.com/ocapn/syrup#pseudo-specification), reminiscent of
|
||||
[`bencode`](https://en.wikipedia.org/wiki/Bencode).
|
||||
|
||||
[^this-example-from-mdbook]: This example is a simplified form of the preprocessor type
|
||||
definitions for
|
||||
[mdBook](https://rust-lang.github.io/mdBook/for_developers/preprocessors.html), the system
|
||||
used to render this manual. I use a real [Preserves schema
|
||||
definition](https://git.syndicate-lang.org/synit/synit/src/branch/main/manual/book.prs) for
|
||||
parsing and producing Serde's JSON representation of mdBook `Book` structures in order to
|
||||
[preprocess the manual's source
|
||||
code](https://git.syndicate-lang.org/synit/synit/src/branch/main/manual/mdbook-ditaa).
|
||||
|
||||
[^including-json]: Including JSON values, of course!
|
||||
|
||||
[^lose-compatibility]: By doing so, we of course lose compatibility with the Serde structures.
|
||||
|
|
|
@ -11,17 +11,17 @@ It provides:
|
|||
1. A **[root system bus](#the-root-system-bus)** service for use by other programs. In this way, it is
|
||||
analogous to D-Bus.
|
||||
|
||||
2. A general-purpose **[service dependency tracking facility](./service.md)**.
|
||||
2. A **[configuration language](./scripting.md)** suitable for programming
|
||||
[dataspaces](../glossary.md#dataspace) with simple reactive behaviours.
|
||||
|
||||
3. A [**gatekeeper** service](./builtin/gatekeeper.md), for exposing
|
||||
3. A general-purpose **[service dependency tracking facility](./service.md)**.
|
||||
|
||||
4. A [**gatekeeper** service](./builtin/gatekeeper.md), for exposing
|
||||
[capabilities](../glossary.md#capability) to running objects as (potentially long-lived)
|
||||
[macaroon](../glossary.md#macaroon)-style "sturdy references", plus TCP/IP- and
|
||||
Unix-socket-based **[transports](./builtin/relay-listener.md)** for accessing capabilities
|
||||
through the gatekeeper.
|
||||
|
||||
4. A limited **[configuration scripting language](./scripting.md)** suitable for
|
||||
programming [dataspaces](../glossary.md#dataspace) with simple reactive behaviours.
|
||||
|
||||
5. An [`inotify`](https://en.wikipedia.org/wiki/Inotify)-based **[configuration
|
||||
loader](./builtin/config-watcher.md)** which loads and executes configuration files written
|
||||
in the scripting language.
|
||||
|
|
Loading…
Reference in New Issue