First complete (?) draft of the protocol spec

This commit is contained in:
Tony Garnock-Jones 2022-02-24 23:11:12 +01:00
parent b16ea848b4
commit 6b2540dc12
1 changed files with 161 additions and 22 deletions

View File

@ -31,7 +31,7 @@ Transports must
- be able to carry [Preserves](./glossary.md#preserves) values back and forth,
- be reliable and in-order,
- have a well-defined session lifecycle (created → connected → disconnected), and
- assure confidentiality, integrity and replay-resistance.
- assure confidentiality, integrity, authenticity, and replay-resistance.
This document focuses primarily on point-to-point transports, discussing multicast and
in-memory variations briefly toward the end.
@ -545,46 +545,185 @@ The value is recursively traversed. As the relay comes across each embedded refe
## Relay entities
A relay entity is a local proxy for an entity at the other side of a relay link. It forwards
events delivered to it—`assert`, `retract`, `message` and `sync`—across the link to its
counterpart at the other end. It holds two pieces of state: a pointer to the relay link, and
the OID of the remote entity it represents. It packages all received events into `TurnEvent`s
which are then sent across the transport.
## Client and server roles
**Turn boundaries.** When the relay is structured internally using the SAM, it is important to
preserve turn boundaries. When all the relay entities of a given relay instance are managed by
a single actor, this will be natural: a single turn can deliver events to a group of entities
in the actor, so if the relay entity enqueues its `TurnEvent`s in a buffer which is flushed
into a `Turn` packet sent across the transport at the conclusion of the turn, the correct turn
boundaries will be preserved.
## Well-known OIDs
## <span id="well-known-oids">Client and server roles
OID 0, initial ref, initial oid
While the protocol itself is symmetric, in many cases there will be one active ("client") and
one passive ("server") party during the establishment of a transport connection.
As an optional convention, a "server" MAY have a single entity exposed as *well-known OID* 0 at
the establishment of a connection, and a "client" MAY likewise expect OID 0 to resolve to some
pre-arranged entity. It is frequently useful for the pre-arranged entity to be a [gatekeeper
service](./builtin/gatekeeper.md), but direct exposure of a
[dataspace](./glossary.md#dataspace) or even some domain-specific object can also be useful.
Either or both party to a connection may play one role, the other, neither, or both.
APIs for making use of relays in programs should permit programs to supply to a
newly-constructed relay an (optional) *initial ref*, to be exposed as well-known OID 0; an
(optional) *initial OID*, to denote a remote well-known OID and to be immediately proxied by a
local relay entity; or both.
In the case of TCP/IP, the "client" role is often played by a `connect`ing party, and the
"server" by a `listen`ing party, but the opposite arrangement is also useful from time to time.
## Security considerations
((Tease out into Related Work section?))
The security considerations for this protocol fall into two categories: those having to do with
particular transports for relay instances, and those having to do with the protocol itself.
OIDs are locally-meaningful only, so if the transport is secure, so is the reference. Can't
steal one and put it on a different transport: it's like taking fd 6 from another process and
trying to use fd 6 locally to mean what the other process means. Extensive related work and
prior art here.
### Transport security
http://www.erights.org/elib/distrib/captp/index.html
The security of an instance of the protocol depends on the security characteristics of its
transport.
Relate terms here to captp terms:
- Hah, `NonceLocator` vs `Gatekeeper`
- well-known "positions" (??) (vs "OID"s?)
- OID = "index", "capability-list index", "c-list index"
- @cwebber says "c-list is the structure mapping descriptors to live-refs"
**Confidentiality.** Parties outwith the communicating peers must not be able to deduce the
contents of packets sent back and forth: some of the packets may contain secrets. For example,
a `Resolve` message sent to a [gatekeeper service](./builtin/gatekeeper.md) contains a "bearer
capability", which conveys authority to any holder able to present it to the gatekeeper.
### Secrecy
**Integrity.** Packets delivered to peers must be proof from tampering or other in-flight
damage.
### Privacy
**Authenticity.** Each packet delivered to a peer must have genuinely originated with another
party, and must have genuinely originated in the same session. Forgery of packets must be
prevented.
**Replay-resistance.** Each packet delivered to a peer must be delivered exactly once within
the context of the transport session. That is, replay of otherwise-authentic packets must not
be possible from outside the session.
### Protocol security
The protocol builds on, and directly reflects, the [object-capability security
model](./glossary.md#object-capability-model) of the SAM. Entities are accessed via unforgeable
references (OIDs). OIDs are meaningful only within the context of their transport session; in
this way, they are analogous to Unix file descriptors, which are small integers that
meaningfully denote objects only within the context of a single Unix process. If the transport
is secure, so is the reference.
Entities can only obtain references to other entities by the [standard methods by which
"connectivity begets
connectivity"](http://www.erights.org/elib/capability/ode/ode-capabilities.html); namely:
- *By initial conditions.* The relevant initial conditions here are the state of the relays at
the moment a transport session is established, including any mappings from [well-known
OIDs](#well-known-oids) to their underlying refs.
- *By parenthood and by endowment.* No direct provision is made for creation of new entities
in this protocol, so these do not apply.
- *By introduction.* Transmission of OIDs in `Turn` packets, and the associated [rules for
managing the mappings between OIDs and references](#membranes), are the normal method by
which references pass from one entity to another.
While transport confidentiality is important for preserving secrecy of secrets such as bearer
capabilities, OIDs do not need this kind of protection. An attacker able to observe OIDs
communicated via a transport does not gain authority to deliver events to the denoted entity.
At most, the attacker may glean information on patterns of interconnectivity among entities
communicating across a transport link.
## Relation to CapTP
This protocol is *strikingly* similar to a family of protocols known as
[CapTP](http://www.erights.org/elib/distrib/captp/index.html) (see, for example,
[here](http://www.erights.org/elib/distrib/captp/index.html),
[here](https://spritelyproject.org/news/what-is-captp.html) and
[here](https://github.com/ocapn/ocapn)). This is no accident: the Syndicated Actor Model draws
heavily on the actor model, and has over the years been incrementally evolving to be closer and
closer to the actor model as it appears in the [E programming
language](http://www.erights.org/). However, the Syndicate protocol described in this document
was developed based on the needs of the Syndicated Actor Model, without particular reference to
CapTP. This makes it all the more striking that the similarities should be so strong. No doubt
I have been subconsciously as well as consciously influenced by E's design, but perhaps there
might also be a Platonic form awaiting discovery somewhere nearby.
For example:
- CapTP has the notion of a "c-list [capability list] index", cognate with our OID. A c-list
index is meaningful only within the context of a transport connection, just like an OID is.
A given c-list index maps to a "live-ref", an in-memory pointer to an object, in the same
way that an OID maps to a ref via a `WireSymbol`.
- CapTP has "[the four tables](http://www.erights.org/elib/distrib/captp/4tables.html)" at
each end of a connection; each of our relays has two [membranes](#membranes), each having
two unidirectional mapping tables.
- Syndicate [gatekeeper services](./builtin/gatekeeper.md) borrow the concept of a
[SturdyRef](http://wiki.erights.org/wiki/SturdyRef) directly from CapTP. However, the notion
of a gatekeeper entity at well-known OID 0 is an example of convergent evolution in action:
in the CapTP world, the [analogous
service](http://www.erights.org/elib/distrib/captp/NonceLocator.html) happens also to be
available at c-list index 0, by convention.
A notable difference is that this protocol completely lacks support for the promises/futures of
CapTP. CapTP c-list indices are just one part of a framework of
[descriptors](http://www.erights.org/elib/distrib/captp/index.html) (*desc*s) denoting various
kinds of remote object and eventual remote-procedure-call (RPC) result. The SAM handles RPC in
a different, more low-level way.
## Specific transport mappings
TCP/IP
For now, this document focuses on `SOCK_STREAM`-like transports: reliable, in-order,
bidirectional, connection-oriented, fully-duplex byte streams. While these transports naturally
have a certain level of integrity assurance and replay-resistance associated with them, special
care should be taken in the case of non-cryptographic transport protocols like plain TCP/IP.
TLS TCP/IP
To use such a transport for this protocol, establish a connection and begin transmitting
[`Packet`s](#packet-definitions) encoded as Preserves values using either the Preserves [text
syntax](https://preserves.gitlab.io/preserves/preserves.html#textual-syntax) or the Preserves
[binary syntax](https://preserves.gitlab.io/preserves/preserves.html#compact-binary-syntax).
The session starts with the first packet and ends with transport disconnection. It MUST
disconnect the transport upon syntax error. A responding server MUST support the binary syntax,
and MAY also support the text syntax. It can autodetect the syntax variant by following [the
rules in the
specification](https://preserves.gitlab.io/preserves/preserves.html#appendix-autodetection-of-textual-or-binary-syntax):
in short, the first byte of a valid binary-syntax Preserves document is guaranteed not to be
interpretable as the start of a valid UTF-8 sequence.
WebSockets
`Packet`s encoded in either binary or text syntax are self-delimiting. However, peers using
text syntax MAY choose to insert whitespace (e.g. newline) after each transmitted packet.
Some domain-specific details are also relevant:
## Other kinds of medium
- **Unix-domain sockets.** An additional layer of authentication checks can be made based on
process-ID and user-ID credentials associated with each Unix-domain socket.
- **TCP/IP sockets.** Plain TCP/IP sockets offer only weak message integrity and
replay-resistance guarantees, and offer no authenticity or confidentiality guarantees at
all. Plain TCP/IP sockets SHOULD NOT be used; consider using TLS sockets instead.
- **TLS atop TCP/IP.** An additional layer of authentication checks can be made based on the
signatures and certificates exchanged during TLS setup.
> TODO: concretely develop some recommendations for ordinary use of TLS certificates,
> including referencing a domain name in a `SturdyRef`, checking the presented certificate,
> and requiring SNI at the server end.
- **WebSockets atop HTTP 1.x.** These suffer similar flaws to plain TCP/IP sockets and SHOULD NOT
be used.
- **WebSockets atop HTTPS 1.x.** Similar considerations to the use of TLS sockets apply
regarding authentication checks. WebSocket messages are self-delimiting; peers MUST place
exactly one `Packet` in each WebSocket message. Since (a) WebSockets are established after a
standard HTTP(S) message header exchange, (b) every HTTP(S) request header starts with an
ASCII letter, and (c) every `Packet` in text syntax begins with the ASCII "`<`" character,
it is possible to autodetect use of a WebSocket protocol multiplexed on a server socket that
is also able to handle plain Preserves binary and/or text syntax for `Packet`s: any ASCII
character between "`A`" and "`Z`" or "`a`" and "`z`" must be HTTP, an ASCII "`<`" must be
Preserves text syntax, and any byte with the high bit set must be Preserves binary syntax.
Multicast/broadcast, in-memory
## Appendix: Complete schema of the protocol