protocols/schemas | ||
racket | ||
udp-ds | ||
node-aggregated-subs.pr | ||
node-aggregated-subs2.pr | ||
README.md |
FEDS: FEderated DataSpaces
An adaptation to the modern Syndicated Actor Model of the prototype dataspace federation developed as part of the FRµIT project in 2019.
Protocol variations
So far, I have explored three alternative protocols for federating assertions among nodes connected in a spanning tree. Each implementation is roughly 40 lines of Racket code.
All the protocol variations considered here have the same core structure. Each node has exactly one internal dataspace, called a fragment, speaking a custom protocol. Each node has some entities that link to adjacent nodes' fragments, and some entities that proxy "local" access to the node's own fragment. Local access makes a fragment appear like a normal dataspace, with all the federation machinery hidden away in its internal operation. Links operate in quite different ways for each protocol variation.
In every case, the protocols establish a rooted, directed tree of nodes, not a DAG or a general graph. Each node has exactly one "upstream" node. The connection between a node and its upstream is called an uplink.
For simplicity of presentation, I'll discuss ideas as if only assertions exist, without messages. Implementing support for messages is reasonably straightforward given solutions for assertions.
Common concepts: relay object, assertion relay
Two frequently-occurring concepts are the notions of relay object and assertion relay, related but distinct.
A relay object is an entity which takes each received assertion matching some pattern, performs some simple, optional transformation on it such as adding or removing an envelope, and asserts the result in turn at some designated target entity. Relaying may be, but is usually not, conditional (on e.g. some property of the assertion).
An assertion relay is like a relay object, but for use with an existing dataspace. It is
an assertion of interest - an Observe
record - that matches some (dataspace) pattern,
transforms the bindings into a new value, and asserts the new value at some target entity.
The "copy-up" protocol
This protocol works by copying all non-subscription assertions upward in the tree. The root
of the federation tree then has a copy of every non-subscription assertion made anywhere in any
fragment connected to the tree. Subscription assertions are aggregated before being relayed up
the tree. Information flows back down the tree "out of band" by the normal Syndicate assertion
flow mechanism: aggregated distributor entity references are embedded in Observe
records and
targetted by <at
...>
records.
Create = <create @kind LinkKind @reply #:#:any> .
LinkKind = =uplink / =local .
LinkEnvelope = <link @id any @body any> .
Content = <content @body any> .
FromUplink = <from-uplink @distributor #:any @body any> .
At = <at @observer #:any @body any> .
Each local access entity (which, remember, is acting like a plain dataspace for "local"
consumers unaware of federation) takes assertions v and places them in the local fragment
wrapped in a <link #f
v>
envelope.
An uplink is an assertion relay from the local fragment's <content
v>
assertions to the
remote fragment as <link
f v>
, where f is the entity reference for the local
fragment.
Internal to each fragment, every <link _
v>
where v is not an observation record
<Observe
pat e>
results in <content
v>
being asserted directly back into the
fragment.
Observation records are handled differently: if <link _ <Observe
pat _>>
is asserted into
a fragment, a "distributor" relay object d is created which takes each assertion x it
receives and asserts a <from-uplink
d x>
into the fragment. A <content <Observe
pat
d>>
assertion makes the distributor visible to the uplink. Then, for each specific observer
e of the pattern pat, an assertion relay takes <from-uplink
d x>
and asserts x
indirectly at e by way of placing an <at
e x>
record in the fragment.
The purpose of the <at
...>
records is deduplication. Every fragment has an assertion relay
responding to every <at
e x>
with an assertion of x at e.
Finally, for each <link
f <Observe
pat e>>
in a fragment, an Observe
assertion is
placed into the fragment with pattern <group <rec link> {0: <bind <_>> 1:
pat}>
and
observer entity e'. The entity e' is a relay object that expects assertions containing the
bindings [
v ...]
from pat plus an extra binding i for the ID of the asserting
fragment. If i is #f
, or is not equal to f, <at
e [
v ...]>
is then asserted.
The "observer-flood" protocol
This protocol floods the tree with every individual subscription. Replica subscriptions then
react locally to matching assertions. Any assertion matching a given subscription causes
assertion of an <at
...>
record, one for each subscriber. These are relayed hop-by-hop
toward the target entity. No subscription aggregation happens at all: if multiple entities
subscribe to a given pattern, multiple respondent assertions will traverse the network for each
match for each entity.
This protocol variation is roughly equivalent to, and is directly inspired by, the FRμIT federation experiments I did in 2019. It's much simpler, though!
CreateUplink = <uplink @downlinkFragment #:any @reply #:#:any> .
CreateDownlink = <downlink @reply #:#:any> .
From = <from @fragment #:any @body any> .
At = <at @observer #:any @item any> .
Each local access entity takes assertions v and places them in the local fragment f wrapped
in a <from
f v>
envelope.
Uplinks are symmetrical (once established): consider the half-uplink from fragment a to
fragment b, where a symmetrical half-uplink from b to a also exists. Assertions <from
f <Observe
pat e>>
result, when f≠b, in <from
a <Observe
pat e>>
being asserted at b, and also in mirroring of assertions <at
e x>
from b as
assertions <at
e x>
at a.
Within each fragment f, <from
f <Observe _
e>>
causes establishment of a relay of
<at
e x>
to x at e.
Finally, at fragment f, <from _ <Observe
pat e>>
causes assertion of <Observe <group <rec from> {0:
f 1:
pat}>
e'. The entity e' takes all assertions x and asserts
<at
e x>
in f.
The "aggregate-subs" protocol
As in the "observer-flood" protocol, subscriptions are copied across the whole tree. The important difference is that just the subscription patterns are replicated. Subscription entities are treated separately, as a fragment-local concern.
Each subscription is labelled with a "next-hop" fragment. Each local assertion (from a local access entity pointing to this particular fragment) that matches the subscription's pattern is labelled with the pattern, plus the fragment it originated with (a "previous-hop" label), and copied to all "next-hop" fragments for the matching pattern. For a given "next-hop", an assertion is only copied if it has a different "previous-hop" label. This, plus the strict tree structure of the federated network, avoids routing loops.
CreateUplink = <uplink @downlinkFragment #:any @reply #:#:any> .
CreateDownlink = <downlink @reply #:#:any> .
Local = <local @item any> .
At = <at @fragment #:any @item any> .
Sub = <sub @fragment #:any @pat any> .
Pub = <pub @fragment #:any @pat any @vs [any ...]> .
Each local access entity takes assertions v and places them in the local fragment wrapped in
a <local
*v>
envelope.
Uplinks are, as in the "observer-flood" protocol, symmetrical. Consider again a half-uplink
from fragment a to fragment b. Every <sub
f pat>
assertion from a results in
assertion of <at
b <sub
a pat>>
if f≠b. Notice there is no mention of entities
in these sub
records.
Within a fragment f, each <sub _
pat>
assertion causes assertion of <Observe <group <rec local> {0:
pat}>
e>
, where e is a relay object taking the bindings [
v
...]
and asserting <pub
f pat [
v ...]>
at f.
Separately, each <sub
t pat>
at f, when t≠f, establishes an assertion relay from
<pub
a pat x>
at f to <at
t <pub
f pat x>>
whenever a≠t.
Local observers at f, <local <Observe
pat e>>
, trigger assertion of <sub
f
pat>
at f, plus establishment of an assertion relay sending each <pub _
pat x>
at
f directly as x to e.
Finally, an assertion relay at every fragment f sends each <at
t x>
directly as x
to t. The purpose of <at
...>
in this protocol is local deduplication of assertions
before sending them on to a potentially-distant peer, even though that peer is a deduplicating
dataspace itself. This saves on network traffic.
Discussion
An interesting consideration is the sync
operation from the Syndicated Actor
Model. ISTM that synchronizing to a
federated dataspace should reach all the leaves of the spanning tree and come back before
completing, making it in principle quite expensive! Idea: share the costs by having an epoch
counter constantly ticking away, so local sync operations can wait for the epoch counter to
increment, say, twice, or something.
The federation implementation from the FRμIT project of 2019 is hundreds of lines long. What makes for such a difference? The answer is that the 2019 code was implementing an early form of the Syndicate wire protocol in addition to providing federation. The current implementation has disentangled these two considerations into a design where federation can be elegantly added to a network of Syndicated actors without much difficulty and without revision of the core Syndicate protocol or the Syndicated actor model itself.
Subscription aggregation is done naively in the current protocols. The structure of dataspace implementations could usefully be reflected in the federation protocol, perhaps treating each of the "skeleton", "constant positions" and "binding positions" of subscriptions separately when aggregating. This starts to move toward a routing-protocol-like system. Imagine some structure to the subscriptions that happens to reflect locality of connection among nodes in the spanning tree: nodes "downstream" of a given node tend to have subscriptions with a "common prefix". The "common prefix" gets shorter as you travel up the tree. If subscriptions were to aggregate based on common prefix, you end up with something analogous to IP subnet aggregation and comparable to IP routing protocols.