28 KiB
Efficient, Imperative Dataspaces for Conversational Concurrency
Tony Garnock-Jones tonyg@leastfixedpoint.com
20 October 2018; revised 21 June 2019 and 4-5 April 2024
Abstract. The dataspace model of Conversational Concurrency [is great], but implementing it efficiently has been difficult until now. Existing approaches use a complex data structure that depends for its efficiency on sophisticated run-time support. This paper presents a new approach to implementation of the dataspace model that gives three benefits. First, it avoids the complexity and run-time support requirements of previous approaches, bringing dataspaces to a wider range of environments. Second, it unlocks new types of conversational interaction among concurrent components. Third, it dramatically improves performance. Key to the new technique is a syntactic treatment of assertions of interest, contrasting with the semantic treatment of assertion sets used by the earlier approach.
Constructing assertions
Imagine a language for constructing data with embedded function calls and variable references. Imagine that it is a fragment of a larger language.
c ∈ assertions C ::= e | x(c, ...)
v ∈ values V ::= a | x(v, ...)
e ∈ expressions E ::= a | x | e e ...
x ∈ identifiers X
a ∈ atoms A = numbers ∪ strings ∪ ...
Here are some examples of assertions in C
, along with suggested
interpretations:
present("Alice") Alice is present in the chat room
speak("Alice", "Hello!") Alice says "Hello!"
Assertions of interest
In the dataspace model, "subscriptions" go hand in hand with
assertions of interest in subscribed-to data. The model includes two
special constructors for discussing interests. The first, observe
,
is interpreted as a declaration of interest in assertions matching the
pattern given as its sole argument. The second, discard
, is
interpreted as a "don't care" when part of a pattern within observe
.
observe(present(discard())) Interest in the presence of any user
observe(speak("Alice", discard())) Interest in every time Alice speaks
We extend the dataspace model with an additional special constructor
which allows interested parties to declare the portions of matching
assertions that they specifically wish to examine further: capturing
positions. The capture
constructor signals that the interested party
will treat specially the corresponding portion of a matching
assertion.1
observe(present(capture())) Interest in each present user
observe(speak("Alice", capture())) Interest in the things Alice says
There is an important difference between observe(present(discard()))
and observe(present(capture()))
. The former declares that the
interested party cares only about whether any user at all is present,
while the latter declares an interest in the identities of the
specific users that are present. Similarly, observe(speak("Alice", discard()))
declares an interest in receiving a notification each
time Alice speaks, but no interest in the content of each utterance,
while observe(speak("Alice", capture()))
declares interest in
learning the things Alice says.
To drive this point home, the following patterns both result in a notification event for each utterance by any user. The first results in notifications carrying the name of the speaker along with the content of their speech. The second results in notifications carrying only the name of the speaker.
observe(speak(capture(), capture())) Interest in who says what
observe(speak(capture(), discard())) Interest in who speaks
Patterns
Imagine now an enriched version of our language that can construct patterns over data, including captures and "don't care" positions.
p ∈ patterns P ::= v | x(p, ...) | $x | _
Syntactic patterns can be translated into assertions of interest
directly. Binding subpatterns $x
are translated into capture()
,
and "don't care" patterns _
into discard()
.
Indexing assertions and patterns
There are two kinds of change in a running dataspace model program. First, assertions can be added to and removed from the dataspace. When this happens, interested facets must be informed of relevant changes. Second, facets and their event handlers can be added to and removed from the dataspace. When this happens, new handlers must be informed of preexisting matching assertions.
To efficiently respond to these two kinds of change, we maintain a special index. Every time an event handler within a facet is created, we augment the index using a data structure called a skeleton. Each skeleton contains information gleaned from static analysis of the pattern associated with the event handler. The index also records every assertion added to the dataspace, so as to correctly initialize event handlers added later.
Skeletons
A skeleton is comprised of three pieces: a shape, describing the positions of statically-known constructors in matching assertions; a constant map, which places restrictions on fields within constructors; and a capture map, which specifies locations of captured positions.
Each time an assertion is added or removed, it is conceptually checked against each handler's skeleton. First, the overall shape is checked. If the assertion passes this check, the constant map is checked. If all the constants match, the capture map is used to prepare an argument vector, and the event handler's callback is invoked.
k ∈ skeletons K = S × (H ⟼ V+1) × [H]
s ∈ shapes S = (H ⟼ L)
ℓ ∈ classes L = X -- label
h ∈ paths H = [𝐍]
Shapes retain only statically-known constructors in a pattern:
shape :: P → S
shape p = shape' [] p
where
shape' :: H → P → S
shape h v = ∅
shape h x(p₀, ..., pᵢ) = (h ⟼ x)
∪ (shape (h++[0]) p₀) ∪ ...
∪ (shape (h++[i]) pᵢ)
shape h $x = ∅
shape h _ = ∅
A constant map filters potential matches by placing constraints on
contained fields. The paths H
in the map denote positions to be
checked; the predicates V+1
denote either a particular value that must
exist at that position, or a simple check that the term in question
merely has a value in that position.
constantmap :: P → (H ⟼ V+1)
constantmap p = cmap [] p
where
cmap :: H → P → (H ⟼ V+1)
cmap h v = (h ⟼ inl v)
cmap h x(p₀, ..., pᵢ) = (cmap (h++[0]) p₀) ∪ ...
∪ (cmap (h++[i]) pᵢ)
cmap h $x = ∅
cmap h _ = (h ⟼ inr ())
It will be useful to separate value-check operations from existence-check operations.
constantchecks :: (H ⟼ V+1) → (H ⟼ V) × 𝒫(H)
constantchecks m = ( { h ⟼ v | h ⟼ inl v ∈ m },
{ h | h ⟼ inr () ∈ m } )
Finally, a capture map extracts all capturing positions in a pattern.
capturemap :: P → [H]
capturemap p = vmap [] p
where
vmap :: H → P → [H]
vmap h v = []
vmap h x(p₀, ..., pᵢ) = (vmap (h++[0]) p₀) ++ ...
++ (vmap (h++[i]) pᵢ)
vmap h $x = [h]
vmap h _ = []
The index
The index incorporates every active event handler and every active assertion in the dataspace.
Overview and structures
An index contains a bag of all currently-asserted assertion-values, as well as the root of a trie-like structure. Information from each indexed event handler's skeleton's shape is laid out along edges connecting trie nodes.
Every node contains a "continuation", which embodies information from a skeleton's constant map and capture map alongside handler callback functions and caches of currently-asserted values.
Index = Bag(V) × Node
Node = Continuation × (Move ⟼ L ⟼ Node)
Move = 𝐍 × H
Continuation = 𝒫(V) × ([H]×𝒫(H) ⟼ [V] ⟼ Leaf)
Leaf = 𝒫(V) × ([H] ⟼ Handler)
Handler = Bag([V]) × 𝒫(Callback)
Callback = EventType → [V] → V
EventType ::= "+" | "-" | "!"
Bag(τ) = τ ⟼ 𝐍 -- bag of τ values
To use an index in the context of a single assertion—be it a new
addition, a removal, or a message to be delivered—follow a path from
the root Node
of the index along move- and class-labelled edges,
collecting Continuations
as you go. This yields a complete set of
event handlers that may match the assertion being considered. Further
investigating each collected Continuation
by analyzing its constant
maps yields a set of matching Leaf
s. Finally, each Leaf
specifies
a set of captured positions in the assertion to extract and pass to
the contained callbacks.
At every Continuation
, Leaf
and Handler
object, the index
maintains a set of currently-asserted values that conform to the
constraints implied by the object's position in the overall index.
Most of the components in an index are mutable: the Bag(V)
and
initial class-to-Node
map in the
root; the assertion-value cache set in each Continuation
or Leaf
object; the map from move to class to Node
within each
Node
; the map from path list to value-list to Leaf
in each
Continuation
; the map from path list to Handler
in each Leaf
;
and the Bag([V])
in every Handler
. This reflects the fact that the
index directly reflects the current state of the dataspace it is
indexing.
From pattern shapes to tries of moves
Definition. A visit of a tree is a sequence of paths to nodes
within the tree. It may be described in two equivalent ways: as a
sequence of rooted (absolute) paths, [H]
, or as a sequence of moves,
relative paths, [Move]
.
w ∈ absolute visits [H]
w̅ ∈ relative visits [Move]
Definition. A move or relative path h̅ ∈ Move
consists of zero
or more steps rootward from a position in a tree, followed by a path
from that position leafward in the tree. We define operators ⊕
and ⊖
for applying a move to an existing path and computing a move from one
path to another, respectively:
⊕ :: H → Move → H
hₒ ⊕ (n, h) = dropRight n hₒ ++ h
⊖ :: H → H → Move
h ⊖ hₒ = (|hₒ| - |h'|, dropLeft |h'| h)
where
h' = longestCommonPrefix hₒ h
The first relative path in a relative visit is interpreted with respect to the root of the tree. Relative and absolute visits are interconvertible:
absToRel :: [H] → [Move]
absToRel hs = rel [] hs
where
rel hₒ [] = []
rel hₒ [h, h₁, ...] = [h ⊖ hₒ] ++ rel h [h₁, ...]
relToAbs :: [Move] → [H]
relToAbs hs = abs [] hs
where
abs hₒ [] = []
abs hₒ [h̅, h̅₁, ...] = [hₒ ⊕ h̅] ++ abs (hₒ ⊕ h̅) [h̅₁, ...]
Definition. The shapeVisit
function converts a shape into a
sequence of Move × L
pairs. The Move
s in the sequence are a
visit of the nodes in the domain of the input shape.
shapeVisit :: S → [Move × L]
shapeVisit s = zip (absToRel (map fst s')) (map snd s')
where
s' = sort lexLt s
The utility sort :: ∀A ∀B . (A → A → 2) → (A ⟼ B) → [(A, B)]
produces a sorted sequence from a finite map and a "less-than"
predicate, which in this case is lexLt
, the lexicographic ordering on
paths.
Implementation note. The type S = (H ⟼ L)
is an (unordered) map, but
could equally well be a sequence of pairs S = [H × L]
with the side
condition that the H
s must be unique. With that representation,
shape
can be adjusted to produce output in lexicographically-sorted
order, obviating the need for sort
in shapeVisit
.
Implementation note. In the special case of visiting a shape derived
from a pattern, a move (n, h)
will always have either |h| = 0
(if it
is the first move in the visit) or |h| = 1
(if not). This allows
representation of moves in indexes as 𝐍 × (1 + 𝐍)
instead of the fully
general 𝐍 × H
.
Lemma. Every shape produced by shape p
for a pattern p
includes
a mapping for all and only the interior nodes of the tree embodied by
p
. That is, every non-leaf node in p
has a path in the domain of
shape p
.
Proof. By induction on p
and examination of shape
. ∎
Lemma. Every relative path contained in a nonempty visit produced by
shapeVisit (shape p)
has a leafward path of length one, except the
first such relative path, which always equals (0, [])
.
Proof. By properties of the lexicographic ordering of paths and the
lemma above. The first path in shapeVisit
's result will always be the
relative path to the root node, (0, [])
, since that is the smallest
possible path by the ordering. Subsequent paths will always be to an
immediate child of the current node or of one of its ancestors. If it
were not so, a contradiction would arise: since every interior node is
represented, every immediate child with children of its own must appear,
and lexicographic ordering requires that such nodes appear before their
own children, so "skipping" a generation is not possible. ∎
Adding and removing event handlers
Every event handler is a pair of a skeleton and a callback function.
Adding or removing an event handler proceeds in two stages. First, the
index is extended to incorporate a path computed from the skeleton's
shape into the Node
-based trie. Second, the capture map and callback
are installed into or removed from the Continuation
within the
Node
at the end of that path.
Because (statically-known) shapes are finite and not particularly numerous in any given program, the implementation assumes that it is never necessary to remove shapes from the index. Instead, it limits itself to removal of handler functions, capture maps, and constant maps. This assumption will have to be revisited in future broker-like cases where handlers are dynamically installed.
Example. Let our pattern be
p = x(y(3, 4), $v, z(_, w(), _), _)
The skeleton of the pattern is then
k = (shape p, constantmap p, capturemap p)
shape p = ([] ⟼ x)
∪ ([0] ⟼ y)
∪ ([2] ⟼ z)
∪ ([2, 1] ⟼ w)
constantmap p = ([0, 0] ⟼ inl 3)
∪ ([0, 1] ⟼ inl 4)
∪ ([2, 0] ⟼ inr ())
∪ ([2, 2] ⟼ inr ())
∪ ([3] ⟼ inr ())
capturemap p = [[1]]
The shape-visit of p
is thus
shapeVisit (shape p) = [((0, []), x),
((0, [0]), y),
((1, [2]), z),
((0, [1]), w)]
Definition. The partial project
function extracts the subvalue at
a given path h
from an overall value v
.
project :: V → H ⇀ V
project v [] = v
project x(v₀, ..., vᵢ) (n:h) = project vₙ h, if 0 ≤ n ≤ i
Definition. The projectMany
partial function projects a sequence
of subvalues.
projectMany :: V → [H] ⇀ [V]
projectMany v [h, ...] = [project v h, ...]
Definition. The classof
partial function extracts the constructor
label x
from a record value. It is undefined for non-record values.
classof :: V ⇀ L
classof x(v₀, ..., vᵢ) = x
Definition. The extend
procedure augments an index with shape
information s
, where ∃p . s = shape p
, by imperatively updating
the index structure. It returns the Continuation
associated with the
final Node
visited in the path described by s
.
extend :: Index → S → Continuation
extend (_, root) s = visit [] root (shapeVisit s)
where
visit :: H → Node → [Move × L] → Continuation
visit h (cont, moveTable) [] = cont
visit h (cont, moveTable) ([(h̅, ℓ)] ++ moves) =
if h̅ not in moveTable then
moveTable[h̅] := {}
let classTable = moveTable[h̅]
if ℓ not in classTable then
let vs = { v | v ∈ fst cont,
classof (project v (h ⊕ h̅)) = ℓ }
classTable[ℓ] := ((vs, {}), {})
visit (h ⊕ h̅) classTable[ℓ] moves
Definition. The addHandler
procedure installs into an index an
event handler callback f
expecting values matching and captured by
the given skeleton k
. It then invokes f
once for each distinct
sequence of captured values matching existing assertions in the
index.2
addHandler :: Index → K → Callback → 1
addHandler index (s, constantMap, captureMap) f =
let (cache, table) = extend index s
let (unsortedConstants, checks) = constantchecks constantMap
let constants = sort lexLt unsortedConstants
let constLocs = map fst constants
let constKey = (constLocs, checks)
if constKey not in table then
table[constKey] := {}
for v in cache
if ∀h ∈ checks, project v h is defined and
∃key . key = projectMany v constLocs then
if key not in table[constKey] then
table[constKey][key] := ({}, {})
let (leafcache, _leaftable) = table[constKey][key]
leafcache += v
let constVals = map snd constants
if constVals not in table[constKey] then
table[constKey][constVals] := ({}, {})
let (leafcache, leaftable) = table[constKey][constVals]
if captureMap not in leaftable then
let bag = empty_bag
for v in leafcache
if ∃seq . seq = projectMany v captureMap then
bag[seq] += 1
leaftable[captureMap] := (bag, {})
let (bag, f_table) = leaftable[captureMap]
f_table += f
for (seq ⟼ _) in bag
f "+" seq
()
Definition. The removeHandler
procedure removes an event handler
from an index.
removeHandler :: Index → K → Callback → 1
removeHandler index (s, constantMap, captureMap) f =
let (_, table) = extend index s
let (unsortedConstants, checks) = constantchecks constantMap
let constants = sort lexLt unsortedConstants
let constLocs = map fst constants
let constKey = (constLocs, checks)
if constKey not in table then
return
let constVals = map snd constants
if constVals not in table[constKey] then
return
let (leafcache, leaftable) = table[constKey][constVals]
if captureMap not in leaftable then
return
let (bag, f_table) = leaftable[captureMap]
if f not in f_table then
return
f_table -= f
if f_table = {} then
delete leaftable[captureMap]
if leafcache = {} and leaftable = {} then
delete table[constKey][constVals]
if table[constKey] = {} then
delete table[constKey]
Adding assertions, removing assertions and sending messages
All three operations depend on a single traversal procedure, parameterized with different update procedures.
Definition. The modify
procedure traverses an index trie,
following the structure of v
and updating cached assertion sets
according to the given update procedures. The update procedures act
by side-effect; in particular, the m_handler
procedure may choose
to invoke the callback passed to it.
Operation = { AddAssertion, RemoveAssertion, SendMessage }
modify :: Index →
Operation →
V →
(Continuation → V → 1) →
(Leaf → V → 1) →
(Handler → [V] → 1) →
1
modify (_, root) operation v m_cont m_leaf m_handler =
visit root [v]
where
visit :: Node → [V] → 1
visit (cont, moveTable) vs =
visit-cont cont
for ((n, h) ⟼ classTable) in moveTable
let (v' : vs') = dropLeft vs n in
if ∃v . v = project v' h and
∃ℓ . ℓ = classof v and
∃next . (ℓ ⟼ next) ∈ classTable then
visit next (v : v' : vs')
visit-cont :: Continuation → 1
visit-cont cont@(_, table) =
m_cont cont v
for ((constLocs, checks) ⟼ constVals) in table
if ∀h ∈ checks, project v h is defined and
∃consts . consts = projectMany v constLocs then
if operation = AddAssertion and consts not in constVals then
constVals[consts] := ({}, {})
if consts in constVals then
let leaf@(leafcache, leaftable) = constVals[consts]
m_leaf leaf v
for (captureMap ⟼ handler) in leaftable
if ∃vs . vs = projectMany v captureMap then
m_handler handler vs
if operation = RemoveAssertion and leafcache = {} and leaftable = {} then
delete constVals[consts]
if constVals = {} then
delete table[(constLocs, checks)]
Definition. The procedure adjustAssertion
updates the copy-count
associated with v
in the given index, invoking callbacks as a
side-effect if this changes the observable contents of the
dataspace.
adjustAssertion :: Index → V → 𝐍 → 1
adjustAssertion index@(cache, root) v delta =
let was_present = v in cache
cache[v] += delta
let is_present = v in cache
if not was_present and is_present then
modify index AddAssertion v add_cont add_leaf add_handler
if was_present and not is_present then
modify index RemoveAssertion v del_cont del_leaf del_handler
where
add_cont (cache, _) v = cache += v
add_leaf (leafcache, _) v = leafcache += v
add_handler (bag, f_table) vs =
let was_present = vs in bag
bag[vs] += 1
if not was_present then
for f in f_table
f "+" vs
del_cont (cache, _) v = cache -= v
del_leaf (leafcache, _) v = leafcache -= v
del_handler (bag, f_table) vs =
bag[vs] -= 1
if vs not in bag then
for f in f_table
f "-" vs
Definition. The procedures addAssertion
and removeAssertion
install and remove an assertion v
into the given index,
respectively.
addAssertion :: Index → V → 1
addAssertion index v = adjustAssertion index v 1
removeAssertion :: Index → V → 1
removeAssertion index v = adjustAssertion index v -1
Care must be taken when applying entire patches to ensure that added
assertions are processed before removed assertions; otherwise, actors
will observe glitching in certain cases. For example, consider an
endpoint with a wildcard subscription [_]
and a separate endpoint
asserting [3]
. If a patch atomically replaces [3]
with [4]
, then
if the removal is processed first, it will briefly appear to the [_]
endpoint as if no assertions remain, whereas if the addition is
processed first, no glitch will be detected.
Definition. The procedure sendMessage
delivers a message v
to
event handlers in the given index.
sendMessage :: Index → V → 1
sendMessage (_, root) v =
modify root SendMessage v send_cont send_leaf send_handler
where
send_cont _ _ = ()
send_leaf _ _ = ()
send (_, f_table) vs =
for f in f_table
f "!" vs
Variations
Exact arity matching
The initial version of this design had
k ∈ skeletons K = S × (H ⟼ V) × [H]
ℓ ∈ classes L = X × 𝐍 -- label and arity
which provided for exact arity matching instead of extensible "at-least"
arity matching. Constant maps contained V
rather than V+1
because
the arity check as part of the class obviated the need to check a
position for mere existence.
Matching atom classes
Skeleton constant map predicates V+1
can be changed to include any
other kind of predicate besides equal-to-expected-value V
and simple
existence 1
, such as string?
/int?
/etc.
Potential future optimizations
JIT compilation of shapes, constant checks, captures
TODO
Static analysis of messages and assertions
Static analysis of expressions under (send! ...)
and (assert ...)
could cut out even more structural overhead.
For example, given interests
observe(message("actor1", capture()))
observe(message("actor2", capture()))
and a message
:: message(x, y)
static analysis could directly connect the sending site to a
hash-table lookup with x
as the key, and invocation of the resulting
handlers with y
as the argument. There would be no need to perform a
lookup based on the message
constructor at runtime.
Similarly, given an interest
observe(present(capture()))
and an assertion
present(user)
static analysis could directly invoke handlers with user
as the
argument, without needing to at runtime find the set of handlers
interested in the present
constructor.
TODO
-
describe the cleanup function associated with a handler in the real implementation
relay.rkt
uses it. When an inner actor asserts interest in an inbound assertion-set, the relay process pivots into the outer dataspace's context, and adds a new endpoint that relays events to the inner dataspace. The cleanup function attached to that endpoint retracts (from the inner dataspace) any matching assertions left over at the time the endpoint is removed.- that appears to be it! Nowhere else in the code is a
skeleton-interest
constructed with a non-#f
cleanup function.
-
figure out and describe scoped assertions / visibility-restrictions
- (partial/sketchy answer:) It's to deal with the fact that multiple endpoints may overlap. Within a single dataspace, an assertion matching both endpoints will trigger each of them. When relaying, the relay maintains an endpoint in the outer space for each in the inner space. When both outer endpoints are triggered, if they were to naively relay the matching assertion, the problem isn't so much that they'd double up (because dataspaces deduplicate!), the problem is that they don't have enough information to reconstruct the triggering outer assertion perfectly! So a visibility-restriction causes an assertion to only trigger inner endpoints that capture at most the captures of the outer endpoint. One of the outer endpoints will trigger its "matching" inner endpoint, but not the inner endpoint of the other endpoint, even though you might expect the relayed assertion to do so.
- There's also a need for
(opaque-placeholder)
s to frustrate constant-matching against literal(discard)
in cases of visibility restriction. See commit937bb7a
and test casetest/core/nesting-confusion-2.rkt
. - HOWEVER see notes from 2019-06-18 in the googledoc Journal as
well as in my notebook. See also commit
5923bdd
from imperative-syndicate and e806f4b from syndicate-js. The opaque-placeholders make the distributed (= non-zero-latency) case of visibility-restriction handling problematic in general, though relaxing the constraint from exact match of captured positions to at-most-match of captured positions allows at least theduring
special case to work in a programmer-unsurprising way.
-
there's more to say about the implementation of the dataspace itself, not just the index structures. For example, the care that must be taken regarding
cleanup-changes
and abandoning work during exception handling.
-
The implemented system affords a single argument to
capture
that restricts matches according to the nested subpattern. What is written here ascapture()
corresponds tocapture(discard())
in the full implementation. The pattern language syntax is analogously extended. ↩︎ -
Because we store sets of function values, we rely on the general availability of a closure equivalence relation. Pointer-equality of closures (
eq?
) suffices. ↩︎