syndicate-rkt/syndicate/HOWITWORKS.md

24 KiB
Raw Permalink Blame History

Efficient, Imperative Dataspaces for Conversational Concurrency

Tony Garnock-Jones tonyg@leastfixedpoint.com
20 October 2018; revised 21 June 2019

Abstract. The dataspace model of Conversational Concurrency [is great], but implementing it efficiently has been difficult until now. Existing approaches use a complex data structure that depends for its efficiency on sophisticated run-time support. This paper presents a new approach to implementation of the dataspace model that gives three benefits. First, it avoids the complexity and run-time support requirements of previous approaches, bringing dataspaces to a wider range of environments. Second, it unlocks new types of conversational interaction among concurrent components. Third, it dramatically improves performance. Key to the new technique is a syntactic treatment of assertions of interest, contrasting with the semantic treatment of assertion sets used by the earlier approach.

Constructing assertions

Imagine a language for constructing data with embedded function calls and variable references. Imagine that it is a fragment of a larger language.

c ∈ assertions  C ::= e | x(c, ...)
v ∈ values      V ::= a | x(v, ...)
e ∈ expressions E ::= a | x | e e ...
x ∈ identifiers X
a ∈ atoms       A = numbers  strings  ...

Here are some examples of assertions in c, along with suggested interpretations:

present("Alice")           Alice is present in the chat room
speak("Alice", "Hello!")   Alice says "Hello!"

Assertions of interest

In the dataspace model, "subscriptions" go hand in hand with assertions of interest in subscribed-to data. The model includes two special constructors for discussing interests. The first, observe, is interpreted as a declaration of interest in assertions matching the pattern given as its sole argument. The second, discard, is interpreted as a "don't care" when part of a pattern within observe.

observe(present(discard()))         Interest in the presence of any user
observe(speak("Alice", discard()))  Interest in every time Alice speaks

We extend the dataspace model with an additional special constructor which allows interested parties to declare the portions of matching assertions that they specifically wish to examine further: capturing positions. The capture constructor signals that the interested party will treat specially the corresponding portion of a matching assertion.1

observe(present(capture()))         Interest in each present user
observe(speak("Alice", capture()))  Interest in the things Alice says

There is an important difference between observe(present(discard())) and observe(present(capture())). The former declares that the interested party cares only about whether any user at all is present, while the latter declares an interest in the identities of the specific users that are present. Similarly, observe(speak("Alice", discard())) declares an interest in receiving a notification each time Alice speaks, but no interest in the content of each utterance, while observe(speak("Alice", capture())) declares interest in learning the things Alice says.

To drive this point home, the following patterns both result in a notification event for each utterance by any user. The first results in notifications carrying the name of the speaker along with the content of their speech. The second results in notifications carrying only the name of the speaker.

observe(speak(capture(), capture()))  Interest in who says what
observe(speak(capture(), discard()))  Interest in who speaks

Patterns

Imagine now an enriched version of our language that can construct patterns over data, including captures and "don't care" positions.

p ∈ patterns    P ::= e | x(p, ...) | $x | _

Syntactic patterns can be translated into assertions of interest directly. Binding subpatterns $x are translated into capture(), and "don't care" patterns _ into discard().

Indexing assertions and patterns

There are two kinds of change in a running dataspace model program. First, assertions can be added to and removed from the dataspace. When this happens, interested facets must be informed of relevant changes. Second, facets and their event handlers can be added to and removed from the dataspace. When this happens, new handlers must be informed of preexisting matching assertions.

To efficiently respond to these two kinds of change, we maintain a special index. Every time an event handler within a facet is created, we augment the index using a data structure called a skeleton. Each skeleton contains information gleaned from static analysis of the pattern associated with the event handler. The index also records every assertion added to the dataspace, so as to correctly initialize event handlers added later.

Skeletons

A skeleton is comprised of three pieces: a shape, describing the positions and arities of statically-known constructors in matching assertions; a constant map, which places restrictions on fields within constructors; and a capture map, which specifies locations of captured positions.

Each time an assertion is added or removed, it is conceptually checked against each handler's skeleton. First, the overall shape is checked. If the assertion passes this check, the constant map is checked. If all the constants match, the capture map is used to prepare an argument vector, and the event handler's callback is invoked.

k ∈ skeletons   K   = S × [H×E] × [H]
s ∈ shapes      S ::= * | x(s, ...)
h ∈ paths       H   = [𝐍]

Shapes retain only statically-known constructors and arities in a pattern:

shape :: P -> S
shape e          = *
shape x(p, ...)  = x(shape p, ...)
shape $x         = *
shape _          = *

A constant map extracts all non-capturing, non-discard positions in a pattern. The expressions in the map are evaluated at the time the corresponding event handler is installed; that is, at facet creation time. They are not subsequently reevaluated; if any expression depends on a dataflow variable, and that variable changes, the entire handler is removed, reevaluated, and reinstalled.

constantmap :: P -> [(H, E)]
constantmap p  = cmap [] p
  where
    cmap :: H -> P -> [(H, E)]
    cmap h e                 = [(h, e)]
    cmap h x(p_0, ..., p_i)  = (cmap (h++[0]) p_0) ++
                               ... ++
                               (cmap (h++[i]) p_i)
    cmap h $x                = []
    cmap h _                 = []

Finally, a capture map extracts all capturing positions in a pattern:

capturemap :: P -> [H]
capturemap p  = vmap [] p
  where
    vmap :: H -> P -> [H]
    vmap h e                = []
    vmap h x(p_0, ..., p_i) = (vmap (h++[0]) p_0) ++
                              ... +
                              (vmap (h++[i]) p_i)
    vmap h $x               = [h]
    vmap h _                = []

The index

The index incorporates every active event handler and every active assertion in the dataspace.

Overview and structures

An index is a pair of a bag of all currently-asserted assertion-values, plus the root node of a trie-like structure. Information from each indexed event handler's skeleton's shape is laid out along edges connecting trie nodes.

Every node contains a "continuation", which embodies information from a skeleton's constant map and capture map alongside handler callback functions and caches of currently-asserted values.

       Index = Bag(V) × Node
        Node = Continuation × (Selector ⟼ Class ⟼ Node)
    Selector = 𝐍 × 𝐍		-- pop-count and index
       Class = X × 𝐍		-- label and arity

Continuation = 𝒫(V) × ([H] ⟼ [V] ⟼ Leaf)
        Leaf = 𝒫(V) × ([H] ⟼ Handler)

     Handler = Bag([V]) × 𝒫(Callback)

    Callback = EventType -> [V] -> V
 EventType ::= "+" | "-" | "!"

      Bag(τ) = τ ⟼ 𝐍		-- bag of τ values

To use an index in the context of a single assertion—be it a new addition, a removal, or a message to be delivered—follow a path from the root Node of the index along Selector/Class-labelled edges, collecting Continuations as you go. This yields a complete set of event handlers that may match the assertion being considered. Further investigating each collected Continuation by analyzing its constant maps yields a set of matching Leafs. Finally, each Leaf specifies a set of captured positions in the assertion to extract and pass to the contained callbacks.

At every Continuation, Leaf and Handler object, the index maintains a set of currently-asserted values that conform to the constraints implied by the object's position in the overall index.

Most of the components in an index are mutable: the Bag(V) in the root; the assertion-value cache set in each Continuation or Leaf object; the map from Selector to Class to Node within each Node; the map from path list to value-list to Leaf in each Continuation; the map from path list to Handler in each Leaf; and the Bag([V]) in every Handler. This reflects the fact that the index directly reflects the current state of the dataspace it is indexing.

Adding and removing event handlers

Every event handler is a pair of a skeleton and a callback function.

Adding or removing an event handler proceeds in two stages. First, the index is extended to incorporate a path computed from the skeleton's shape into the Node-based trie. Second, the capture map and callback are installed into or removed from the Continuation within the Node at the end of that path.

Because (statically-known) shapes are finite and not particularly numerous in any given program, the implementation assumes that it is never necessary to remove shapes from the index. Instead, it limits itself to removal of handler functions, capture maps, and constant maps. This assumption will have to be revisited in future broker-like cases where handlers are dynamically installed.

Definition. The project function extracts the subvalue at a given path h from an overall value v.

project :: V -> H -> V
project v [] = v
project x(v_0, ..., v_i) (n:h) = project v_n h

Definition. The projectMany function projects a sequence of subvalues.

projectMany :: V -> [H] -> V
projectMany v [h_0, ...] = [project v h_0, ...]

Definition. The classof function extracts the constructor label x and its arity i from a value v, yielding () if v is not a record.

classof :: V -> 1 + Class
classof a                = ()
classof x(v_0, ..., v_i) = (x,i)

Definition. The extend procedure augments an index with shape information s, by imperatively updating the index structure. It returns the Continuation associated with the deepest Node visited in the path described by s.

extend :: Node -> S -> Continuation
extend node s =
    let (_, (cont, _)) = walk-node [] node 0 0 s
    cont
  where

    walk-edge :: H -> Node -> 𝐍 -> 𝐍 -> [S] -> (𝐍,Node)
    walk-edge h node n_pop n_index [] =
      (n_pop + 1, node)
    walk-edge h node n_pop n_index (s:shapes) =
      let (n_pop', node') = walk-node h node n_pop n_index s
      let n_index' = n_index + 1
      let h' = (dropRight h 1) ++ [n_index']
      walk-edge h' node' n_pop' n_index' shapes

    walk-node :: H -> Node -> 𝐍 -> 𝐍 -> S -> (𝐍,Node)
    walk-node h node n_pop n_index * =
      (n_pop, node)
    walk-node h node n_pop n_index x(s_0, ... s_i) =
      let (cont, edges) = node
      let selector = (n_pop,n_index)
      let class = (x,i)
      if selector not in edges then
          edges[selector] := {}
      let table = edges[selector]
      if class not in table then
          let (outercache, constmap) = cont
          let innercache =
            { v | v ∈ outercache,
                  classof (project v h) = class }
          table[class] := ((innercache, {}), {})
      let node' = table[class]
      walk-edge (h ++ [0]) node' 0 0 [s_0, ..., s_i]

Definition. The addHandler procedure installs into an index an event handler callback f expecting values matching and captured by the given skeleton k. It then invokes f once for each distinct sequence of captured values matching existing assertions in the index.2

addHandler :: Index -> (S × [H×V] × [H]) -> Callback -> 1
addHandler index (s, constantMap, captureMap) f =
  let (_, root) = index
  let (cache, table) = extend root s
  let constLocs = [h | (h,v) ∈ constantMap]
  if constLocs not in table then
      table[constLocs] := {}
      for v in cache
          let key = projectMany v constLocs
          if key not in table[constLocs] then
              table[constLocs][key] := ({}, {})
          let (leafcache, _leaftable) = table[constLocs][key]
          leafcache += v
  let constVals = [v | (h,v) ∈ constantMap]
  if constVals not in table[constLocs] then
      table[constLocs][constVals] := ({}, {})
  let (leafcache, leaftable) = table[constLocs][constVals]
  if captureMap not in leaftable then
      let bag = empty_bag
      for v in leafcache
          bag[projectMany v captureMap] += 1
      leaftable[captureMap] := (bag, {})
  let (bag, f_table) = leaftable[captureMap]
  f_table += f
  for seq in bag
      f "+" seq
  ()

Definition. The removeHandler procedure removes an event handler from an index.

removeHandler :: Index -> (S × [H×V] × [H]) -> Callback -> 1
removeHandler index (s, constantMap, captureMap) f =
  let (_, root) = index
  let (cache, table) = extend root s
  let constLocs = [h | (h,v) ∈ constantMap]
  if constLocs not in table then
      return
  let constVals = [v | (h,v) ∈ constantMap]
  if constVals not in table[constLocs] then
      return
  let (leafcache, leaftable) = table[constLocs][constVals]
  if captureMap not in leaftable then
      return
  let (bag, f_table) = leaftable[captureMap]
  if f not in f_table then
      return
  f_table -= f
  if f_table = {} then
      delete leaftable[captureMap]
  if leafcache = {} and leaftable = {} then
      delete table[constLocs][constVals]
  if table[constLocs] = {} then
      delete table[constLocs]

Adding assertions, removing assertions and sending messages

All three operations depend on a single traversal procedure, parameterized with different update procedures.

Definition. The modify procedure traverses an index trie, following the structure of v and updating cached assertion sets according to the given update procedures. The update procedures act by side-effect; in particular, the m_handler procedure may choose to invoke the callback passed to it.

Operation = { AddAssertion, RemoveAssertion, SendMessage }

modify :: Node ->
          Operation ->
          V ->
          (Continuation -> V -> 1) ->
          (Leaf -> V -> 1) ->
          (Handler -> [V] -> 1) ->
          1
modify node operation v m_cont m_leaf m_handler =
  walk-node node [outermost(v)]
  where
    walk-node :: Node -> [V] -> 1
    walk-node (cont, edges) vs =
      walk-cont cont
      for sel@(n_pop, n_index) in edges
          let vs' = dropLeft vs n_pop
          let (x(v_0, ...) : _) = vs'
          let v' = v_{n_index}
          if classof v' in edges[sel] then
              walk-node edges[sel][classof v'] (v':vs')

    walk-cont :: Continuation -> 1
    walk-cont cont@(cache, table) =
      m_cont cont v
      for constLocs in table
        let consts = projectMany v constLocs
        if operation = AddAssertion and consts not in table[constLocs] then
            table[constLocs][consts] := ({}, {})
        if consts in table[constLocs] then
            let leaf@(leafcache, leaftable) =
                  table[constLocs][consts]
            m_leaf leaf v
            for captureMap in leaftable
                let handler = leaftable[captureMap]
                let vs = projectMany v captureMap
                m_handler handler vs
            if operation = RemoveAssertion and leafcache = {} and leaftable = {} then
                delete table[constLocs][consts]
                if table[constLocs] = {} then
                    delete table[constLocs]

The outermost constructor applied to v at the top of modify is necessary because every path in the trie structure embodied in each node is a sequence of zero or more (move, check) pairs. Each "move" pops zero or more items from the stack and then pushes a sub-structure of the topmost stack element onto the stack; the "check" then examines the class of the new top element, abandoning the search if it does not match. Without some outermost constructor, there would be no possible "move", and the trie would have to be expressed as a single optional check followed by zero or more (move, check) pairs.

Definition. The procedure adjustAssertion updates the copy-count associated with v in the given index, invoking callbacks as a side-effect if this changes the observable contents of the dataspace.

adjustAssertion :: Index -> V -> 𝐍 -> 1
adjustAssertion (cache, root) v delta =
    let was_present = v in cache
    cache[v] += delta
    let is_present = v in cache
    if not was_present and is_present then
        modify root AddAssertion v add_cont add_leaf add_handler
    if was_present and not is_present then
        modify root RemoveAssertion v del_cont del_leaf del_handler
  where
    add_cont (cache, _) v = cache += v
    add_leaf (leafcache, _) v = leafcache += v
    add_handler (bag, f_table) vs =
      let was_present = vs in bag
      bag[vs] += 1
      if not was_present then
          for f in f_table
              f "+" vs

    del_cont (cache, _) v = cache -= v
    del_leaf (leafcache, _) v = leafcache -= v
    del_handler (bag, f_table) vs =
      bag[vs] -= 1
      if vs not in bag then
          for f in f_table
              f "-" vs

Definition. The procedures addAssertion and removeAssertion install and remove an assertion v into the given index, respectively.

addAssertion :: Index -> V -> 1
addAssertion index v = adjustAssertion index v 1

removeAssertion :: Index -> V -> 1
removeAssertion index v = adjustAssertion index v -1

Care must be taken when applying entire patches to ensure that added assertions are processed before removed assertions; otherwise, actors will observe glitching in certain cases. For example, consider an endpoint with a wildcard subscription [_] and a separate endpoint asserting [3]. If a patch atomically replaces [3] with [4], then if the removal is processed first, it will briefly appear to the [_] endpoint as if no assertions remain, whereas if the addition is processed first, no glitch will be detected.

Definition. The procedure sendMessage delivers a message v to event handlers in the given index.

sendMessage :: Index -> V -> 1
sendMessage (_, root) v =
    modify root SendMessage v send_cont send_leaf send_handler
  where
    send_cont _ _ = ()
    send_leaf _ _ = ()
    send (_, f_table) vs =
      for f in f_table
          f "!" vs

Potential future optimizations

Static analysis of messages and assertions

Static analysis of expressions under (send! ...) and (assert ...) could cut out even more structural overhead.

For example, given interests

observe(message("actor1", capture()))
observe(message("actor2", capture()))

and a message

:: message(x, y)

static analysis could directly connect the sending site to a hash-table lookup with x as the key, and invocation of the resulting handlers with y as the argument. There would be no need to perform a lookup based on the message constructor at runtime.

Similarly, given an interest

observe(present(capture()))

and an assertion

present(user)

static analysis could directly invoke handlers with user as the argument, without needing to at runtime find the set of handlers interested in the present constructor.


TODO

  • describe the cleanup function associated with a handler in the real implementation

    • relay.rkt uses it. When an inner actor asserts interest in an inbound assertion-set, the relay process pivots into the outer dataspace's context, and adds a new endpoint that relays events to the inner dataspace. The cleanup function attached to that endpoint retracts (from the inner dataspace) any matching assertions left over at the time the endpoint is removed.
    • that appears to be it! Nowhere else in the code is a skeleton-interest constructed with a non-#f cleanup function.
  • figure out and describe scoped assertions / visibility-restrictions

    • (partial/sketchy answer:) It's to deal with the fact that multiple endpoints may overlap. Within a single dataspace, an assertion matching both endpoints will trigger each of them. When relaying, the relay maintains an endpoint in the outer space for each in the inner space. When both outer endpoints are triggered, if they were to naively relay the matching assertion, the problem isn't so much that they'd double up (because dataspaces deduplicate!), the problem is that they don't have enough information to reconstruct the triggering outer assertion perfectly! So a visibility-restriction causes an assertion to only trigger inner endpoints that capture at most the captures of the outer endpoint. One of the outer endpoints will trigger its "matching" inner endpoint, but not the inner endpoint of the other endpoint, even though you might expect the relayed assertion to do so.
    • There's also a need for (opaque-placeholder)s to frustrate constant-matching against literal (discard) in cases of visibility restriction. See commit 937bb7a and test case test/core/nesting-confusion-2.rkt.
    • HOWEVER see notes from 2019-06-18 in the googledoc Journal as well as in my notebook. See also commit 5923bdd from imperative-syndicate and e806f4b from syndicate-js. The opaque-placeholders make the distributed (= non-zero-latency) case of visibility-restriction handling problematic in general, though relaxing the constraint from exact match of captured positions to at-most-match of captured positions allows at least the during special case to work in a programmer-unsurprising way.
  • there's more to say about the implementation of the dataspace itself, not just the index structures. For example, the care that must be taken regarding cleanup-changes and abandoning work during exception handling.


  1. The implemented system affords a single argument to capture that restricts matches according to the nested subpattern. What is written here as capture() corresponds to capture(discard()) in the full implementation. The pattern language syntax is analogously extended. ↩︎

  2. Because we store sets of function values, we rely on the general availability of a closure equivalence relation. Pointer-equality of closures (eq?) suffices. ↩︎