From 0c701809e1984f421b74acb1cda94ae350634012 Mon Sep 17 00:00:00 2001 From: Tony Garnock-Jones Date: Sun, 14 Oct 2018 21:25:13 +0100 Subject: [PATCH] Describe how the efficient index structure works --- syndicate/HOWITWORKS.md | 693 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 693 insertions(+) create mode 100644 syndicate/HOWITWORKS.md diff --git a/syndicate/HOWITWORKS.md b/syndicate/HOWITWORKS.md new file mode 100644 index 0000000..5a1b1e6 --- /dev/null +++ b/syndicate/HOWITWORKS.md @@ -0,0 +1,693 @@ +# Efficient, Imperative Dataspaces for Conversational Concurrency + +Tony Garnock-Jones +14 October 2018 + +

Abstract. The dataspace + model of Conversational Concurrency [is great], but implementing it + efficiently has been difficult until now. Existing approaches use a + complex data structure that depends for its efficiency on + sophisticated run-time support. This paper presents a new approach + to implementation of the dataspace model that gives three benefits. + First, it avoids the complexity and run-time support requirements of + previous approaches, bringing dataspaces to a wider range of + environments. Second, it unlocks new types of conversational + interaction among concurrent components. Third, it dramatically + improves performance. Key to the new technique is a syntactic + treatment of assertions of interest, contrasting with the semantic + treatment of assertion sets used by the earlier approach.

+ +## Constructing assertions + +Imagine a language for constructing data with embedded function calls +and variable references. Imagine that it is a fragment of a larger +language. + + c ∈ assertions C ::= e | x(c, ...) + v ∈ values V ::= a | x(v, ...) + e ∈ expressions E ::= a | x | e e ... + x ∈ identifiers X + a ∈ atoms A = numbers ∪ strings ∪ ... + +Here are some examples of assertions in `c`, along with suggested +interpretations: + + present("Alice") Alice is present in the chat room + speak("Alice", "Hello!") Alice says "Hello!" + +## Assertions of interest + +In the dataspace model, "subscriptions" go hand in hand with +assertions of *interest* in subscribed-to data. The model includes two +special constructors for discussing interests. The first, `observe`, +is interpreted as a declaration of interest in assertions matching the +pattern given as its sole argument. The second, `discard`, is +interpreted as a "don't care" when part of a pattern within `observe`. + + observe(present(discard())) Interest in the presence of any user + observe(speak("Alice", discard())) Interest in every time Alice speaks + +We extend the dataspace model with an additional special constructor +which allows interested parties to declare the portions of matching +assertions that they specifically wish to examine further: *capturing* +positions. The `capture` constructor signals that the interested party +will treat specially the corresponding portion of a matching +assertion.[^capture-subpattern] + + observe(present(capture())) Interest in each present user + observe(speak("Alice", capture())) Interest in the things Alice says + + [^capture-subpattern]: The implemented system affords a single + argument to `capture` that restricts matches according to the + nested subpattern. What is written here as `capture()` corresponds + to `capture(discard())` in the full implementation. The pattern + language syntax is analogously extended. + +There is an important difference between `observe(present(discard()))` +and `observe(present(capture()))`. The former declares that the +interested party cares only about whether any user at all is present, +while the latter declares an interest in the identities of the +specific users that are present. Similarly, `observe(speak("Alice", +discard()))` declares an interest in receiving a notification each +time Alice speaks, but no interest in the *content* of each utterance, +while `observe(speak("Alice", capture()))` declares interest in +learning the things Alice says. + +To drive this point home, the following patterns both result in a +notification event for each utterance by any user. The first results +in notifications carrying the name of the speaker along with the +content of their speech. The second results in notifications carrying +only the name of the speaker. + + observe(speak(capture(), capture())) Interest in who says what + observe(speak(capture(), discard())) Interest in who speaks + +## Patterns + +Imagine now an enriched version of our language that can construct +patterns over data, including captures and "don't care" positions. + + p ∈ patterns P ::= e | x(p, ...) | $x | _ + +Syntactic patterns can be translated into assertions of interest +directly. Binding subpatterns `$x` are translated into `capture()`, +and "don't care" patterns `_` into `discard()`. + +## Indexing assertions and patterns + +There are two kinds of change in a running dataspace model program. +First, assertions can be added to and removed from the dataspace. When +this happens, interested facets must be informed of relevant changes. +Second, facets and their event handlers can be added to and removed +from the dataspace. When this happens, new handlers must be informed +of preexisting matching assertions. + +To efficiently respond to these two kinds of change, we maintain a +special index. Every time an event handler within a facet is created, +we augment the index using a data structure called a *skeleton*. Each +skeleton contains information gleaned from static analysis of the +pattern associated with the event handler. The index also records +every assertion added to the dataspace, so as to correctly initialize +event handlers added later. + +### Skeletons + +A skeleton is comprised of three pieces: a *shape*, describing the +positions and arities of statically-known constructors in matching +assertions; a *constant map*, which places restrictions on fields +within constructors; and a *capture map*, which specifies locations of +captured positions. + +Each time an assertion is added or removed, it is conceptually checked +against each handler's skeleton. First, the overall shape is checked. +If the assertion passes this check, the constant map is checked. If +all the constants match, the capture map is used to prepare an +argument vector, and the event handler's callback is invoked. + + k ∈ skeletons K = S × [H×E] × [H] + s ∈ shapes S ::= * | x(s, ...) + h ∈ paths H = [𝐍] + +Shapes retain only statically-known constructors and arities in a +pattern: + + shape :: P -> S + shape e = * + shape x(p, ...) = x(shape p, ...) + shape $x = * + shape _ = * + +A constant map extracts all non-capturing, non-discard positions in a +pattern. The expressions in the map are evaluated at the time the +corresponding event handler is installed; that is, at facet creation +time. They are not subsequently reevaluated; if any expression depends +on a dataflow variable, and that variable changes, the entire handler +is removed, reevaluated, and reinstalled. + + constantmap :: P -> [(H, E)] + constantmap p = cmap [0] p + where + cmap :: H -> P -> [(H, E)] + cmap h e = [(h, e)] + cmap h x(p_0, ..., p_i) = (cmap (h++[0]) p_0) ++ + ... ++ + (cmap (h++[i]) p_i) + cmap h $x = [] + cmap h _ = [] + +Finally, a capture map extracts all capturing positions in a pattern: + + capturemap :: P -> [H] + capturemap p = vmap [0] p + where + vmap :: H -> P -> [H] + vmap h e = [] + vmap h x(p_0, ..., p_i) = (vmap (h++[0]) p_0) ++ + ... + + (vmap (h++[i]) p_i) + vmap h $x = [h] + vmap h _ = [] + +### The index + +The index incorporates every active event handler and every active +assertion in the dataspace. + +#### Overview and structures + +An index is a pair of a bag of all currently-asserted +assertion-values, plus the root node of a trie-like structure. +Information from each indexed event handler's skeleton's shape is laid +out along edges connecting trie nodes. + +Every node contains a "continuation", which embodies information from +a skeleton's constant map and capture map, as well as handler callback +functions and caches of currently-asserted values. + + Index = Bag(V) × Node + Node = Continuation × (Selector ⟼ Class ⟼ Node) + Selector = 𝐍 × 𝐍 -- pop-count and index + Class = X × 𝐍 -- label and arity + + Continuation = 𝒫(V) × ([H] ⟼ [V] ⟼ Leaf) + Leaf = 𝒫(V) × ([H] ⟼ Handler) + + Handler = Bag([V]) × 𝒫(EventType -> [V] -> V) + EventType ::= "+" | "-" | "!" + + Bag(τ) = τ ⟼ 𝐍 -- bag of τ values + +To use an index in the context of a single assertion—be it a new +addition, a removal, or a message to be delivered—follow a path from +the root `Node` of the index along `Selector`/`Class`-labelled edges, +collecting `Continuations` as you go. This yields a complete set of +event handlers that may match the assertion being considered. Further +investigating each collected `Continuation` by analyzing its constant +maps yields a set of matching `Leaf`s. Finally, each `Leaf` specifies +a set of captured positions in the assertion to extract and pass to +the contained callbacks. + +At every `Continuation`, `Leaf` and `Handler` object, the index +maintains a set of currently-asserted values that conform to the +constraints implied by the object's position in the overall index. + +Most of the components in an index are *mutable*: the `Bag(V)` in the +root; the assertion-value cache set in each `Continuation` or `Leaf` +object; the map from `Selector` to `Class` to `Node` within each +`Node`; the map from path list to value-list to `Leaf` in each +`Continuation`; the map from path list to `Handler` in each `Leaf`; +and the `Bag([V])` in every `Handler`. This reflects the fact that the +index directly reflects the current state of the dataspace it is +indexing. + +#### Adding and removing event handlers + +Every event handler is a pair of a skeleton and a callback function. + +Adding or removing an event handler proceeds in two stages. First, the +index is extended to incorporate a path computed from the skeleton's +shape into the `Node`-based trie. Second, the capture map and callback +are installed into or removed from the `Continuation` within the +`Node` at the end of that path. + +Because (statically-known) shapes are finite and not particularly +numerous in any given program, the implementation assumes that it is +never necessary to remove shapes from the index. Instead, it limits +itself to removal of handler functions, capture maps, and constant +maps. This assumption will have to be revisited in future broker-like +cases where handlers are dynamically installed. + +**Definition.** The `project` function extracts the subvalue at a + given path `h` from an overall value `v`. + + project :: V -> H -> V + project v h = go dummy(v) h -- TODO: gross + where + go v [] = v + go x(v_0, ... v_i) (n:h) = v_n @ h + +**Definition.** The `projectMany` function projects a sequence of + subvalues. + + projectMany :: V -> [H] -> V + projectMany v [h_0, ...] = [project v h_0, ...] + +**Definition.** The `classof` function extracts the constructor label + `x` and its arity `i` from a value `v`, yielding `()` if `v` is not + a record. + + classof :: V -> 1 + Class + classof a = () + classof x(v_0, ..., v_i) = (x,i) + +**Definition.** The `extend` procedure augments an index with shape + information `s`, by imperatively updating the index structure. It + returns the `Continuation` associated with the deepest `Node` + visited in the path described by `s`. + + extend :: Node -> S -> Continuation + extend node s = + let (_, (cont, _)) = walk-edge [0] node 0 0 [s] + cont + where + + walk-edge :: H -> Node -> 𝐍 -> 𝐍 -> [S] -> (𝐍,Node) + walk-edge h node n_pop n_index [] = + (n_pop + 1, node) + walk-edge h node n_pop n_index (s:shapes) = + let (n_pop', node') = walk-node h node n_pop n_index s + let n_index' = n_index + 1 + let h' = update-path h 1 n_index' + walk-edge h' node' n_pop' n_index' shapes + + walk-node :: H -> Node -> 𝐍 -> 𝐍 -> S -> (𝐍,Node) + walk-node h node n_pop n_index * = + (n_pop, node) + walk-node h node n_pop n_index x(s_0, ... s_i) = + let (cont, edges) = node + let selector = (n_pop,n_index) + let class = (x,i) + if selector not in edges then + edges[selector] := {} + let table = edges[selector] + if class not in edges[selector] then + let (outercache, constmap) = cont + let innercache = + { v | v ∈ outercache, + classof (project v h) = class } + edges[selector][class] := ((innercache, {}), {}) + let node' = edges[selector][class] + let h' = update-path h n_pop 0 + walk-edge h' node' 0 0 [s_0, ..., s_i] + + update-path :: H -> 𝐍 -> 𝐍 -> H + update-path h n_pop n_index = + (dropRight h n_pop) ++ [n_index] + +**Definition.** The `addHandler` procedure installs into an index an + event handler callback `f` expecting values matching and captured by + the given skeleton `k`. It then invokes `f` once for each distinct + sequence of captured values matching existing assertions in the + index.[^function-pointer-equality] + + addHandler :: Index -> (S × [H×V] × [H]) -> ([V] -> V) -> 1 + addHandler index k f = + let (s, constantMap, captureMap) = k + let (_, root) = index + let (cache, table) = extend root s + let constLocs = [h | (h,v) ∈ constantMap] + let constVals = [v | (h,v) ∈ constantMap] + if constLocs not in table then + table[constLocs] := {} + if constVals not in table[constLocs] then + let leafcache = + { v | v ∈ cache, + projectMany v constLocs = constVals } + table[constLocs][constVals] := (leafcache, {}) + let (leafcache, leaftable) = table[constLocs][constVals] + if captureMap not in leaftable then + let bag = empty_bag + for v in leafcache + bag[projectMany v captureMap] += 1 + leaftable[captureMap] := (bag, {}) + let (bag, f_table) = leaftable[captureMap] + f_table += f + for seq in bag + f "+" seq + () + + [^function-pointer-equality]: Because we store *sets* of function + values, we rely on the general availability of a closure + equivalence relation. Pointer-equality of closures (`eq?`) + suffices. + +**Definition.** The `removeHandler` procedure removes an event handler + from an index. + + removeHandler :: Index -> (S × [H×V] × [H]) -> ([V] -> V) -> 1 + removeHandler index k f = + let (s, constantMap, captureMap) = k + let (_, root) = index + let (cache, table) = extend root s + let constLocs = [h | (h,v) ∈ constantMap] + let constVals = [v | (h,v) ∈ constantMap] + if constLocs not in table then + return + if constVals not in table[constLocs] then + return + let (leafcache, leaftable) = table[constLocs][constVals] + if captureMap not in leaftable then + return + let (bag, f_table) = leaftable[captureMap] + if f not in f_table then + return + f_table -= f + if f_table = {} then + delete leaftable[captureMap] + if leaftable = {} then + delete table[constLocs][constVals] + if table[constLocs] = {} then + delete table[constLocs] + +#### Adding assertions, removing assertions and sending messages + +All three operations depend on a single traversal procedure, +parameterized with different update procedures. + +**Definition.** The `modify` procedure traverses an index trie, + following the structure of `v` and updating cached assertion sets + according to the given update procedures. The update procedures act + by side-effect; in particular, the `m_handler` procedure may choose + to invoke the callback passed to it. + + modify :: Node -> + V -> + (Continuation -> V -> 1) -> + (Leaf -> V -> 1) -> + (Handler -> [V] -> 1) -> + 1 + modify node v m_cont m_leaf m_handler = + walk-node node [dummy(v)] -- TODO: gross + where + walk-node :: Node -> [V] -> 1 + walk-node (cont, edges) vs = + walk-cont cont + for sel@(n_pop, n_index) in edges + let vs' = dropLeft vs n_pop + let (x(v_0, ...) : _) = vs' + let v' = v_{n_index} + if classof v' in edges[sel] then + walk-node edges[sel][classof v'] (v':vs') + + walk-cont :: Continuation -> 1 + walk-cont cont@(cache, table) = + m_cont cont v + for constLocs in table + let consts = projectMany v constLocs + if consts in table[constLocs] then + let leaf@(leafcache, leaftable) = + table[constLocs][consts] + m_leaf leaf v + for captureMap in leaftable + let handler = leaftable[captureMap] + let vs = projectMany v captureMap + m_handler handler vs + +**Definition.** The procedure `adjustAssertion` updates the copy-count + associated with `v` in the given index, invoking callbacks as a + side-effect if this changes the observable contents of the + dataspace. + + adjustAssertion :: Index -> V -> 𝐍 -> 1 + adjustAssertion (cache, root) v delta = + let was_present = v in cache + cache[v] += delta + let is_present = v in cache + if not was_present and is_present then + modify root v add_cont add_leaf add_handler + if was_present and not is_present then + modify root v del_cont del_leaf del_handler + where + add_cont (cache, _) v = cache += v + add_leaf (leafcache, _) v = leafcache += v + add_handler (bag, f_table) vs = + let was_present = vs in bag + bag[vs] += 1 + if not was_present then + for f in f_table + f "+" vs + + del_cont (cache, _) v = cache -= v + del_leaf (leafcache, _) v = leafcache -= v + del_handler (bag, f_table) vs = + bag[vs] -= 1 + for f in f_table + f "-" vs + +**Definition.** The procedures `addAssertion` and `removeAssertion` + install and remove an assertion `v` into the given index, + respectively. + + addAssertion :: Index -> V -> 1 + addAssertion index v = adjustAssertion index v 1 + + removeAssertion :: Index -> V -> 1 + removeAssertion index v = adjustAssertion index v -1 + +**Definition.** The procedure `sendMessage` delivers a message `v` to + event handlers in the given index. + + sendMessage :: Index -> V -> 1 + sendMessage (_, root) v = + modify root v send_cont send_leaf send_handler + where + send_cont _ _ = () + send_leaf _ _ = () + send (_, f_table) vs = + for f in f_table + f "!" vs + +## Potential future optimizations + +### Static analysis of messages and assertions + +Static analysis of expressions under `(send! ...)` and `(assert ...)` +could cut out even more structural overhead. + +For example, given interests + + observe(message("actor1", capture())) + observe(message("actor2", capture())) + +and a message + + :: message(x, y) + +static analysis could directly connect the sending site to a +hash-table lookup with `x` as the key, and invocation of the resulting +handlers with `y` as the argument. There would be no need to perform a +lookup based on the `message` constructor at runtime. + +Similarly, given an interest + + observe(present(capture())) + +and an assertion + + present(user) + +static analysis could directly invoke handlers with `user` as the +argument, without needing to at runtime find the set of handlers +interested in the `present` constructor. + +--- + +# TODO + + - describe the cleanup function associated with a handler in the real implementation + - figure out and describe scoped assertions / visibility-restrictions + - check pop/index logic to ensure no off-by-ones + - rearrange to avoid the `dummy(...)` wrappers + +