Progress on protocol spec

This commit is contained in:
Tony Garnock-Jones 2022-02-24 16:01:10 +01:00
parent bc5136c9e9
commit b16ea848b4
2 changed files with 186 additions and 55 deletions

View File

@ -93,6 +93,9 @@ Often abbreviated **SAM**.
Source [entities](#entity) running within an [actor](#actor) publish [assertions](#assertion)
and send [messages](#message) to target entities, possibly in other actors.
Essential idea: state replication is more useful than message-passing. (Message-passing
protocols usually end up simulating it, badly, anyway.)
## System Layer
## System Dataspace
## Transport

View File

@ -36,26 +36,31 @@ Transports must
This document focuses primarily on point-to-point transports, discussing multicast and
in-memory variations briefly toward the end.
## Roles and session establishment
## Roles and session lifecycle
The protocol is completely symmetric, aside from [certain conventions detailed
below](#well-known-oids) about the entities available for use immediately upon connection
establishment. It is *not* a client/server protocol.
To begin a session on a newly-established point-to-point link, a relay simply starts sending
packets. Each peer starts the session with an empty entity reference map ([see
below](#membranes)) and making no assertions in either the outbound (on behalf of local
**Session startup.** To begin a session on a newly-established point-to-point link, a relay
simply starts sending packets. Each peer starts the session with an empty entity reference map
([see below](#membranes)) and making no assertions in either the outbound (on behalf of local
entities) or inbound (on behalf of the remote peer) directions.
**Session teardown.** At the end of a session, terminated normally or abnormally, cleanly or
through involuntary transport disconnection, all published assertions are
retracted.[^automatic-when-implemented-with-sam] This is in keeping with the essence of the
[Syndicated Actor Model (SAM)](./glossary.md#syndicated-actor-model).
## Packet definitions
Packets exchanged by relays are [Preserves](./glossary.md#preserves) values defined using
Preserves [schema](./glossary.md#schema).
```preserves-schema
Packet = Turn / Error / Extension .
```
Packets exchanged by relays are [Preserves](./glossary.md#preserves) values defined using
Preserves [schema](./glossary.md#schema).
A packet may be a *turn*, an *error*, or an *extension*.
Packets are neither commands nor responses; they are *events*.
@ -108,33 +113,32 @@ A `Turn` is the most important packet variant. It directly reflects the
[SAM](./glossary.md#syndicated-actor-model) notion of a [turn](./glossary.md#turn).
**Handling.** Each `Turn` carries [events](./glossary.md#event) to be delivered to
[entities](./glossary.md#entity) residing at the receiving end of the transport.
[entities](./glossary.md#entity) residing in the scope at the receiving end of the transport.
The `assertion` fields of `Assert` events and the `body` fields of `Message` events may contain
any Preserves value, including embedded entity references. On the wire, these will always be
formatted [as described below](#capabilities-on-the-wire). Upon receipt of a `Turn`, embedded
references are first mapped to internal references. Prior to transmission, internal references
are mapped to their external form. [The mapping procedure to follow is detailed
below.](#membranes)
After reference rewriting is complete, the sequence of `TurnEvent`s is examined. The
Upon receipt of a `Turn`, the sequence of `TurnEvent`s is examined. The
[OID](./glossary.md#oid) in each `TurnEvent` selects an entity known to the recipient. Each
`Event` is either publication of an assertion, retraction of a previously-published assertion,
delivery of a single message, or a [synchronization](./glossary.md#synchronization) event.
In the case that the receiving party is structured internally using the SAM, it is important to
preserve turn boundaries. Since turn boundaries are a per-[actor](./glossary.md#actor) concept,
but a `Turn` mentions only entities, the receiver must map entities to actors, group
`TurnEvent`s into per-actor queues, and deliver those queues to each actor in a single SAM turn
for each actor.
The `assertion` fields of `Assert` events and the `body` fields of `Message` events may contain
any Preserves value, including embedded entity references. On the wire, these will always be
formatted [as described below](#capabilities-on-the-wire). As each `Assert` or `Message` is
processed, embedded references are mapped to internal references. Symmetrically, internal
references are mapped to their external form prior to transmission. [The mapping procedure to
follow is detailed below.](#membranes)
The `Handle`s used to refer to published assertions MUST be unique within the scope of the
transport connection.
**Turn boundaries.** In the case that the receiving party is structured internally using the
SAM, it is important to preserve turn boundaries. Since turn boundaries are a
per-[actor](./glossary.md#actor) concept, but a `Turn` mentions only entities, the receiver
must map entities to actors, group `TurnEvent`s into per-actor queues, and deliver those queues
to each actor in a single SAM turn for each actor.
**Uniqueness.** The `Handle`s used to refer to published assertions MUST be unique within the
scope of the transport connection.
## Capabilities on the wire
Packets sent and received on a point-to-point transport frequently include embedded references.
These references denote *capabilities* for interacting with some entity.
References embedded in `Turn` packets denote *capabilities* for interacting with some entity.
For example, assertion of a capability-bearing record could appear as the following `Event`:
@ -154,9 +158,8 @@ Oid = int .
```
The `mine` variant denotes capability references managed by the *sender* of a given packet; the
`yours` variant, the *receiver* of the packet. Accordingly, if a relay receives a packet
mentioning `#![0 555]`, it will later use `#![1 555]` if it needs to send a packet to refer to
that same entity.
`yours` variant, the *receiver* of the packet. A relay receiving a packet mentioning `#![0
555]` will use `#![1 555]` in later responses that refer to that same entity, and *vice versa*.
### Attenuation of authority
@ -166,10 +169,10 @@ additional conditions on the receiver's use of its own capability, known as an
An attenuation is a chain of `Caveat`s.[^caveat-terminology-macaroon] A `Caveat` acts as a
function that, given a Preserves value representing an assertion or message body, yields either
a possibly-rewritten value, or no value at all. In the latter case, the value has been
*rejected*. In the former case, the rewritten value is used as input to the next `Caveat` in
the chain, or as the final assertion or message body for delivery to the entity backing the
capability.
a possibly-rewritten value, or no value at all.[^zero-or-more] In the latter case, the value
has been *rejected*. In the former case, the rewritten value is used as input to the next
`Caveat` in the chain, or as the final assertion or message body for delivery to the entity
backing the capability.
The chain of `Caveats` in an attenuation is written down in *reverse* order: newer `Caveat`s
are appended to the sequence, and each `Caveat`'s output is fed into the input of the next
@ -193,7 +196,7 @@ captured by the pattern are gathered together and used in instantiation of the `
has rejected the input, and other `alternatives` are tried until none remain, at which point
the whole `Caveat` has rejected the input and processing of the triggering event stops.
#### Patterns
### Patterns
A `Pattern` within a rewrite can be any of the following variants:
@ -201,54 +204,54 @@ A `Pattern` within a rewrite can be any of the following variants:
Pattern = PDiscard / PAtom / PEmbedded / PBind / PAnd / PNot / Lit / PCompound .
```
`PDiscard` matches any value:
**Wildcard.** `PDiscard` matches any value:
```preserves-schema
PDiscard = <_>.
```
`PAtom` requires that a matched value be a boolean, a single- or double-precision float, an
**Atomic type.** `PAtom` requires that a matched value be a boolean, a single- or double-precision float, an
integer, a string, a binary blob, or a symbol, respectively:
```preserves-schema
PAtom = =Boolean / =Float / =Double / =SignedInteger / =String / =ByteString / =Symbol .
```
`PEmbedded` requires that a matched value be an embedded capability:
**Embedded value.** `PEmbedded` requires that a matched value be an embedded capability:
```preserves-schema
PEmbedded = =Embedded .
```
`PBind` first *captures* the matched value, adding it to the bindings vector, and then applies
**Binding.** `PBind` first *captures* the matched value, adding it to the bindings vector, and then applies
the nested `pattern`. If the subpattern matches, the `PBind` succeeds; otherwise, it fails:
```preserves-schema
PBind = <bind @pattern Pattern>.
```
`PAnd` is a conjunction of patterns; every pattern in `patterns` must match for the `PAnd` to
**Conjunction.** `PAnd` is a conjunction of patterns; every pattern in `patterns` must match for the `PAnd` to
match:
```preserves-schema
PAnd = <and @patterns [Pattern ...]>.
```
`PNot` is a pattern negation: if `pattern` matches, the `PNot` fails to match, and *vice
**Negation.** `PNot` is a pattern negation: if `pattern` matches, the `PNot` fails to match, and *vice
versa*. It is an error for `pattern` to include any `PBind` subpatterns.
```preserves-schema
PNot = <not @pattern Pattern>.
```
`Lit` is an exact match pattern. If the matched value is exactly equal to `value` (according to
**Literal.** `Lit` is an exact match pattern. If the matched value is exactly equal to `value` (according to
Preserves' own built-in equivalence relation), the match succeeds; otherwise, it fails:
```preserves-schema
Lit = <lit @value any>.
```
Finally, `PCompound` patterns match compound data structures. The `rec` variant demands that a
**Compound.** Finally, `PCompound` patterns match compound data structures. The `rec` variant demands that a
matched value be a record, with label exactly equal to `label` and fields one-for-one matching
the `Pattern`s in `fields`; the `arr` variant demands a sequence, with each element matching
the corresponding element of `items`; and `dict` demands a dictionary having *at least* entries
@ -261,18 +264,20 @@ PCompound =
/ @dict <dict @entries { any: Pattern ...:... }> .
```
#### Bindings
### Bindings
Bindings resulting from matching are stored as a sequence of values.
Matching notionally produces a sequence of values, one for each `PBind` in the pattern.
During matching, when a `PBind` pattern is seen, the matcher *first* appends the matched value
to the binding sequence and *then* recurses on the nested subpattern. This makes binding
*indexes* appear in left-to-right order as a `Pattern` is read.
When a `PBind` pattern is seen, the matcher *first* appends the matched value to the binding
sequence and *then* recurses on the nested subpattern. This makes binding *indexes* appear in
left-to-right order as a `Pattern` is read.
For example, given the pattern `<bind <arr [<bind <_>>, <bind <_>>]>>` and the matched value
`[1 2]`, the resulting captured values will be, in order, `[1 2]`, `1`, and `2`.
**Example.** Given the pattern `<bind <arr [<bind <_>>, <bind <_>>]>>` and the matched value
`["a" "b"]`, the resulting captured values are, in order, `["a" "b"]`, `"a"`, and `"b"`; the
template `<ref 0>` will be instantiated to `["1" "2"]`, `<ref 1>` to `"a"`, and `<ref 2>` to
`"b"`.
#### Templates
### Templates
A `Template` within a rewrite produces a concrete Preserves value when instantiated with a
vector of captured binding values. Template instantiation may fail, yielding no value.
@ -318,7 +323,7 @@ TCompound =
/ @dict <dict @entries { any: Template ...:... }> .
```
#### Validity of Caveats
### Validity of Caveats
The above definitions imply some *validity constraints* on `Caveat`s.
@ -335,12 +340,12 @@ Implementations MUST enforce these constraints (either statically or dynamically
## Membranes
In order to correctly map between embedded references on the wire and entity references local
to the relay, the relay maintains two stateful objects called *membranes*. A membrane is a
bidirectional mapping between [OID](./glossary.md#oid) and relay-internal entity pointer.
Every relay maintains two stateful objects called *membranes*. A membrane is a bidirectional
mapping between [OID](./glossary.md#oid) and relay-internal entity pointer. Membranes connect
embedded references on the wire to entity references local to the relay.
- The *import membrane* connects OIDs managed by the *remote* peer to local *relay entities*
which proxy access to an "imported" remote entity.
- The *import membrane* connects OIDs managed by the *remote* peer to local [relay
entities](#relay-entities) which proxy access to an "imported" remote entity.
- The *export membrane* connects OIDs managed by the *local* peer to any local "exported"
entities accessible to the peer.
@ -380,6 +385,7 @@ bidirectional mapping between [OID](./glossary.md#oid) and relay-internal entity
|
```
<!--
```ditaa one-sided-membrane
---------------------------------------+
@ -404,6 +410,7 @@ bidirectional mapping between [OID](./glossary.md#oid) and relay-internal entity
----------------------------------------+
```
-->
<!--
Each relay rewrites the embedded references in the messages it sends and receives. It maps back
@ -445,6 +452,100 @@ peers.
-->
Logically, a membrane's state can be represented as a set of `WireSymbol` structures: a
`WireSymbol` is a triple of an OID, a local reference pointer (its *ref*), and a reference
count. There is never more than one `WireSymbol` associated with an OID or a ref.
A `WireSymbol` exists only so long as some assertion mentioning its OID exists across the relay
link. When the last assertion mentioning an OID is retracted, its `WireSymbol` is deleted.
Assertions mentioning a particular OID can come from *either side* of the relay link:
initially, a local reference is sent to the peer in an assertion, but then the peer may assert
something *back*, either targeting or mentioning the same entity. Care must be taken not to
release an OID entry prematurely in such situations.
For example, at least the following contribute to a `WireSymbol`'s reference count:
- The initial entry mapping a local entity ref to an well-known OID for use at session startup
([see below](#well-known-oids)) contributes a permanent reference.
- Mention of an OID in a received *or sent* `TurnEvent` adds one to the OID's reference count
for the duration of processing of the event. For `Assert` events in either direction, the
duration of processing is until the assertion is later retracted. For received `Message`
events, the duration of processing is until the incoming message has been forwarded on to
the target ref.
**"Transient" references.** Embedded references in `Message` event bodies are special. Because
messages, unlike assertions, have no notion of lifetime—they are forwarded and forgotten—it is
not possible for a message to cause establishment of a long-lived entry in a membrane's
`WireSymbol` set. Therefore, messages MUST NOT embed any reference not previously known to the
peer (a "transient reference"). In other words, only after using an *assertion* to introduce a
reference, associating a conversational context with its lifetime, is it permitted to discuss
the reference using *messages*. A relay receiving a message bearing a transient reference MUST
terminate the session with an error. A relay about to send such a message SHOULD preemptively
refuse to do so.
### Rewriting embedded references upon receipt
When processing a `Value` *v* in a received `Assert` or `Message` event, embedded references in
*v* are decoded from their [on-the-wire `WireRef` form](#capabilities-on-the-wire) to in-memory
ref-pointer form.
The value is recursively traversed. As the relay comes across each embedded `WireRef`,
- If it is of `mine` variant, it refers to an entity exported by the remote, sending peer. Its
OID is looked up in the import membrane.
- If no `WireSymbol` exists in the import membrane, one is created, mapping the OID to a
fresh [relay entity](#relay-entities).
- If a `WireSymbol` is already present, its associated ref is substituted into *v*.
- If it is of `yours` variant, it refers to an entity previously exported by the local,
receiving peer. Its OID is looked up in the export membrane.
- If no `WireSymbol` exists for the OID, one is created, associating the OID with a dummy
inert entity ref. The dummy ref is substituted into *v*.
- If a `WireSymbol` exists for the OID, and the `WireRef` is not
[attenuated](#attenuation-of-authority), the associated ref is substituted into *v*. If
the `WireRef` *is* attenuated, the associated ref is wrapped with the `Caveat`s from the
`WireRef` before its substitution into *v*.
- In each case, the `WireSymbol` associated with the OID has its reference count incremented
(if an `Assert` is being processed).
### Rewriting embedded references for transmission
When transmitting a `Value` *v* in an `Assert` or `Message` event, embedded references in *v*
are encoded from their in-memory ref-pointer form to [on-the-wire `WireRef`
form](#capabilities-on-the-wire).
The value is recursively traversed. As the relay comes across each embedded reference:
- The reference is first looked up in the export membrane. If an associated `WireSymbol` is
present in the export membrane, its OID is substituted as a `mine`-variant `WireRef` into
*v*.
- Otherwise, it is looked up in the import membrane. If *no* associated `WireSymbol` exists
there, a fresh OID and `WireSymbol` are placed in the export membrane, and the new OID is
substituted as a `mine`-variant `WireRef` into *v*.
- Otherwise, it refers to a previously-imported entity.
- If the local entity reference has not been attenuated subsequent to its import, the OID it
was imported under is substituted as a `yours`-variant `WireRef` into *v* with an empty
attenuation.
- If it has been attenuated, [the relay may choose whether to trust the remote party to
enforce an attenuation request](#attenuation-of-authority). If it trusts the peer to
honour attenuation requests, it substitutes a `yours`-variant `WireRef` with non-empty
attenuation into *v*. Otherwise, a fresh OID and `WireSymbol` are placed in the export
membrane, with ref denoting to the attenuated local reference, and the new OID is
substituted as a `mine`-variant `WireRef` into *v*.
## Relay entities
## Client and server roles
## Well-known OIDs
@ -453,6 +554,21 @@ OID 0, initial ref, initial oid
## Security considerations
((Tease out into Related Work section?))
OIDs are locally-meaningful only, so if the transport is secure, so is the reference. Can't
steal one and put it on a different transport: it's like taking fd 6 from another process and
trying to use fd 6 locally to mean what the other process means. Extensive related work and
prior art here.
http://www.erights.org/elib/distrib/captp/index.html
Relate terms here to captp terms:
- Hah, `NonceLocator` vs `Gatekeeper`
- well-known "positions" (??) (vs "OID"s?)
- OID = "index", "capability-list index", "c-list index"
- @cwebber says "c-list is the structure mapping descriptors to live-refs"
### Secrecy
### Privacy
@ -658,6 +774,10 @@ def instantiate(template, bindings):
IP over IP. A variation of the Syndicate Protocol like this gives [federated
dataspaces](https://syndicate-lang.org/about/history/#postdoc).
[^automatic-when-implemented-with-sam]: This process of assertion-retraction on termination is
largely automatic when relay actors are structured internally using the SAM: simply
terminating a SAM actor automatically retracts its published assertions.
[^no-extensions-yet]: This specification does not define any extensions, but future revisions
could, for example, use extensions to perform version-negotiation. Another potential future
use could be to propagate provenance information for tracing/debugging.
@ -680,3 +800,11 @@ def instantiate(template, bindings):
on [Macaroons](./glossary.md#macaroon), where it is used to describe a more general
mechanism. Future versions of this specification may opt to include some of this
generality.
[^zero-or-more]: TODO: It might be better to have a `Caveat` yield *zero or more* values? That
way they can act as filters. I've sometimes wanted the multiple-value case, though I've so
far been able to work around its lack. TODO: Perhaps it would also make sense to have a
`Caveat` map an *event* to zero or more *events*, rather than to values? Tricky corners
there include ensuring that carried authority isn't misused; macaroons are a very elegant
solution to this problem, of course, so maybe the macaroon design idea could be adapted to
this. For now, `Value`→`Option<Value>` is probably OK.