Describe how the efficient index structure works

This commit is contained in:
Tony Garnock-Jones 2018-10-14 21:25:13 +01:00
parent d497d9e6d1
commit 0c701809e1
1 changed files with 693 additions and 0 deletions

693
syndicate/HOWITWORKS.md Normal file
View File

@ -0,0 +1,693 @@
# Efficient, Imperative Dataspaces for Conversational Concurrency
Tony Garnock-Jones <tonyg@leastfixedpoint.com>
14 October 2018
<p style="font-size:90%"><strong>Abstract.</strong> The dataspace
model of Conversational Concurrency [is great], but implementing it
efficiently has been difficult until now. Existing approaches use a
complex data structure that depends for its efficiency on
sophisticated run-time support. This paper presents a new approach
to implementation of the dataspace model that gives three benefits.
First, it avoids the complexity and run-time support requirements of
previous approaches, bringing dataspaces to a wider range of
environments. Second, it unlocks new types of conversational
interaction among concurrent components. Third, it dramatically
improves performance. Key to the new technique is a syntactic
treatment of assertions of interest, contrasting with the semantic
treatment of assertion sets used by the earlier approach.</p>
## Constructing assertions
Imagine a language for constructing data with embedded function calls
and variable references. Imagine that it is a fragment of a larger
language.
c ∈ assertions C ::= e | x(c, ...)
v ∈ values V ::= a | x(v, ...)
e ∈ expressions E ::= a | x | e e ...
x ∈ identifiers X
a ∈ atoms A = numbers strings ...
Here are some examples of assertions in `c`, along with suggested
interpretations:
present("Alice") Alice is present in the chat room
speak("Alice", "Hello!") Alice says "Hello!"
## Assertions of interest
In the dataspace model, "subscriptions" go hand in hand with
assertions of *interest* in subscribed-to data. The model includes two
special constructors for discussing interests. The first, `observe`,
is interpreted as a declaration of interest in assertions matching the
pattern given as its sole argument. The second, `discard`, is
interpreted as a "don't care" when part of a pattern within `observe`.
observe(present(discard())) Interest in the presence of any user
observe(speak("Alice", discard())) Interest in every time Alice speaks
We extend the dataspace model with an additional special constructor
which allows interested parties to declare the portions of matching
assertions that they specifically wish to examine further: *capturing*
positions. The `capture` constructor signals that the interested party
will treat specially the corresponding portion of a matching
assertion.[^capture-subpattern]
observe(present(capture())) Interest in each present user
observe(speak("Alice", capture())) Interest in the things Alice says
[^capture-subpattern]: The implemented system affords a single
argument to `capture` that restricts matches according to the
nested subpattern. What is written here as `capture()` corresponds
to `capture(discard())` in the full implementation. The pattern
language syntax is analogously extended.
There is an important difference between `observe(present(discard()))`
and `observe(present(capture()))`. The former declares that the
interested party cares only about whether any user at all is present,
while the latter declares an interest in the identities of the
specific users that are present. Similarly, `observe(speak("Alice",
discard()))` declares an interest in receiving a notification each
time Alice speaks, but no interest in the *content* of each utterance,
while `observe(speak("Alice", capture()))` declares interest in
learning the things Alice says.
To drive this point home, the following patterns both result in a
notification event for each utterance by any user. The first results
in notifications carrying the name of the speaker along with the
content of their speech. The second results in notifications carrying
only the name of the speaker.
observe(speak(capture(), capture())) Interest in who says what
observe(speak(capture(), discard())) Interest in who speaks
## Patterns
Imagine now an enriched version of our language that can construct
patterns over data, including captures and "don't care" positions.
p ∈ patterns P ::= e | x(p, ...) | $x | _
Syntactic patterns can be translated into assertions of interest
directly. Binding subpatterns `$x` are translated into `capture()`,
and "don't care" patterns `_` into `discard()`.
## Indexing assertions and patterns
There are two kinds of change in a running dataspace model program.
First, assertions can be added to and removed from the dataspace. When
this happens, interested facets must be informed of relevant changes.
Second, facets and their event handlers can be added to and removed
from the dataspace. When this happens, new handlers must be informed
of preexisting matching assertions.
To efficiently respond to these two kinds of change, we maintain a
special index. Every time an event handler within a facet is created,
we augment the index using a data structure called a *skeleton*. Each
skeleton contains information gleaned from static analysis of the
pattern associated with the event handler. The index also records
every assertion added to the dataspace, so as to correctly initialize
event handlers added later.
### Skeletons
A skeleton is comprised of three pieces: a *shape*, describing the
positions and arities of statically-known constructors in matching
assertions; a *constant map*, which places restrictions on fields
within constructors; and a *capture map*, which specifies locations of
captured positions.
Each time an assertion is added or removed, it is conceptually checked
against each handler's skeleton. First, the overall shape is checked.
If the assertion passes this check, the constant map is checked. If
all the constants match, the capture map is used to prepare an
argument vector, and the event handler's callback is invoked.
k ∈ skeletons K = S × [H×E] × [H]
s ∈ shapes S ::= * | x(s, ...)
h ∈ paths H = [𝐍]
Shapes retain only statically-known constructors and arities in a
pattern:
shape :: P -> S
shape e = *
shape x(p, ...) = x(shape p, ...)
shape $x = *
shape _ = *
A constant map extracts all non-capturing, non-discard positions in a
pattern. The expressions in the map are evaluated at the time the
corresponding event handler is installed; that is, at facet creation
time. They are not subsequently reevaluated; if any expression depends
on a dataflow variable, and that variable changes, the entire handler
is removed, reevaluated, and reinstalled.
constantmap :: P -> [(H, E)]
constantmap p = cmap [0] p
where
cmap :: H -> P -> [(H, E)]
cmap h e = [(h, e)]
cmap h x(p_0, ..., p_i) = (cmap (h++[0]) p_0) ++
... ++
(cmap (h++[i]) p_i)
cmap h $x = []
cmap h _ = []
Finally, a capture map extracts all capturing positions in a pattern:
capturemap :: P -> [H]
capturemap p = vmap [0] p
where
vmap :: H -> P -> [H]
vmap h e = []
vmap h x(p_0, ..., p_i) = (vmap (h++[0]) p_0) ++
... +
(vmap (h++[i]) p_i)
vmap h $x = [h]
vmap h _ = []
### The index
The index incorporates every active event handler and every active
assertion in the dataspace.
#### Overview and structures
An index is a pair of a bag of all currently-asserted
assertion-values, plus the root node of a trie-like structure.
Information from each indexed event handler's skeleton's shape is laid
out along edges connecting trie nodes.
Every node contains a "continuation", which embodies information from
a skeleton's constant map and capture map, as well as handler callback
functions and caches of currently-asserted values.
Index = Bag(V) × Node
Node = Continuation × (Selector ⟼ Class ⟼ Node)
Selector = 𝐍 × 𝐍 -- pop-count and index
Class = X × 𝐍 -- label and arity
Continuation = 𝒫(V) × ([H] ⟼ [V] ⟼ Leaf)
Leaf = 𝒫(V) × ([H] ⟼ Handler)
Handler = Bag([V]) × 𝒫(EventType -> [V] -> V)
EventType ::= "+" | "-" | "!"
Bag(τ) = τ ⟼ 𝐍 -- bag of τ values
To use an index in the context of a single assertion—be it a new
addition, a removal, or a message to be delivered—follow a path from
the root `Node` of the index along `Selector`/`Class`-labelled edges,
collecting `Continuations` as you go. This yields a complete set of
event handlers that may match the assertion being considered. Further
investigating each collected `Continuation` by analyzing its constant
maps yields a set of matching `Leaf`s. Finally, each `Leaf` specifies
a set of captured positions in the assertion to extract and pass to
the contained callbacks.
At every `Continuation`, `Leaf` and `Handler` object, the index
maintains a set of currently-asserted values that conform to the
constraints implied by the object's position in the overall index.
Most of the components in an index are *mutable*: the `Bag(V)` in the
root; the assertion-value cache set in each `Continuation` or `Leaf`
object; the map from `Selector` to `Class` to `Node` within each
`Node`; the map from path list to value-list to `Leaf` in each
`Continuation`; the map from path list to `Handler` in each `Leaf`;
and the `Bag([V])` in every `Handler`. This reflects the fact that the
index directly reflects the current state of the dataspace it is
indexing.
#### Adding and removing event handlers
Every event handler is a pair of a skeleton and a callback function.
Adding or removing an event handler proceeds in two stages. First, the
index is extended to incorporate a path computed from the skeleton's
shape into the `Node`-based trie. Second, the capture map and callback
are installed into or removed from the `Continuation` within the
`Node` at the end of that path.
Because (statically-known) shapes are finite and not particularly
numerous in any given program, the implementation assumes that it is
never necessary to remove shapes from the index. Instead, it limits
itself to removal of handler functions, capture maps, and constant
maps. This assumption will have to be revisited in future broker-like
cases where handlers are dynamically installed.
**Definition.** The `project` function extracts the subvalue at a
given path `h` from an overall value `v`.
project :: V -> H -> V
project v h = go dummy(v) h -- TODO: gross
where
go v [] = v
go x(v_0, ... v_i) (n:h) = v_n @ h
**Definition.** The `projectMany` function projects a sequence of
subvalues.
projectMany :: V -> [H] -> V
projectMany v [h_0, ...] = [project v h_0, ...]
**Definition.** The `classof` function extracts the constructor label
`x` and its arity `i` from a value `v`, yielding `()` if `v` is not
a record.
classof :: V -> 1 + Class
classof a = ()
classof x(v_0, ..., v_i) = (x,i)
**Definition.** The `extend` procedure augments an index with shape
information `s`, by imperatively updating the index structure. It
returns the `Continuation` associated with the deepest `Node`
visited in the path described by `s`.
extend :: Node -> S -> Continuation
extend node s =
let (_, (cont, _)) = walk-edge [0] node 0 0 [s]
cont
where
walk-edge :: H -> Node -> 𝐍 -> 𝐍 -> [S] -> (𝐍,Node)
walk-edge h node n_pop n_index [] =
(n_pop + 1, node)
walk-edge h node n_pop n_index (s:shapes) =
let (n_pop', node') = walk-node h node n_pop n_index s
let n_index' = n_index + 1
let h' = update-path h 1 n_index'
walk-edge h' node' n_pop' n_index' shapes
walk-node :: H -> Node -> 𝐍 -> 𝐍 -> S -> (𝐍,Node)
walk-node h node n_pop n_index * =
(n_pop, node)
walk-node h node n_pop n_index x(s_0, ... s_i) =
let (cont, edges) = node
let selector = (n_pop,n_index)
let class = (x,i)
if selector not in edges then
edges[selector] := {}
let table = edges[selector]
if class not in edges[selector] then
let (outercache, constmap) = cont
let innercache =
{ v | v ∈ outercache,
classof (project v h) = class }
edges[selector][class] := ((innercache, {}), {})
let node' = edges[selector][class]
let h' = update-path h n_pop 0
walk-edge h' node' 0 0 [s_0, ..., s_i]
update-path :: H -> 𝐍 -> 𝐍 -> H
update-path h n_pop n_index =
(dropRight h n_pop) ++ [n_index]
**Definition.** The `addHandler` procedure installs into an index an
event handler callback `f` expecting values matching and captured by
the given skeleton `k`. It then invokes `f` once for each distinct
sequence of captured values matching existing assertions in the
index.[^function-pointer-equality]
addHandler :: Index -> (S × [H×V] × [H]) -> ([V] -> V) -> 1
addHandler index k f =
let (s, constantMap, captureMap) = k
let (_, root) = index
let (cache, table) = extend root s
let constLocs = [h | (h,v) ∈ constantMap]
let constVals = [v | (h,v) ∈ constantMap]
if constLocs not in table then
table[constLocs] := {}
if constVals not in table[constLocs] then
let leafcache =
{ v | v ∈ cache,
projectMany v constLocs = constVals }
table[constLocs][constVals] := (leafcache, {})
let (leafcache, leaftable) = table[constLocs][constVals]
if captureMap not in leaftable then
let bag = empty_bag
for v in leafcache
bag[projectMany v captureMap] += 1
leaftable[captureMap] := (bag, {})
let (bag, f_table) = leaftable[captureMap]
f_table += f
for seq in bag
f "+" seq
()
[^function-pointer-equality]: Because we store *sets* of function
values, we rely on the general availability of a closure
equivalence relation. Pointer-equality of closures (`eq?`)
suffices.
**Definition.** The `removeHandler` procedure removes an event handler
from an index.
removeHandler :: Index -> (S × [H×V] × [H]) -> ([V] -> V) -> 1
removeHandler index k f =
let (s, constantMap, captureMap) = k
let (_, root) = index
let (cache, table) = extend root s
let constLocs = [h | (h,v) ∈ constantMap]
let constVals = [v | (h,v) ∈ constantMap]
if constLocs not in table then
return
if constVals not in table[constLocs] then
return
let (leafcache, leaftable) = table[constLocs][constVals]
if captureMap not in leaftable then
return
let (bag, f_table) = leaftable[captureMap]
if f not in f_table then
return
f_table -= f
if f_table = {} then
delete leaftable[captureMap]
if leaftable = {} then
delete table[constLocs][constVals]
if table[constLocs] = {} then
delete table[constLocs]
#### Adding assertions, removing assertions and sending messages
All three operations depend on a single traversal procedure,
parameterized with different update procedures.
**Definition.** The `modify` procedure traverses an index trie,
following the structure of `v` and updating cached assertion sets
according to the given update procedures. The update procedures act
by side-effect; in particular, the `m_handler` procedure may choose
to invoke the callback passed to it.
modify :: Node ->
V ->
(Continuation -> V -> 1) ->
(Leaf -> V -> 1) ->
(Handler -> [V] -> 1) ->
1
modify node v m_cont m_leaf m_handler =
walk-node node [dummy(v)] -- TODO: gross
where
walk-node :: Node -> [V] -> 1
walk-node (cont, edges) vs =
walk-cont cont
for sel@(n_pop, n_index) in edges
let vs' = dropLeft vs n_pop
let (x(v_0, ...) : _) = vs'
let v' = v_{n_index}
if classof v' in edges[sel] then
walk-node edges[sel][classof v'] (v':vs')
walk-cont :: Continuation -> 1
walk-cont cont@(cache, table) =
m_cont cont v
for constLocs in table
let consts = projectMany v constLocs
if consts in table[constLocs] then
let leaf@(leafcache, leaftable) =
table[constLocs][consts]
m_leaf leaf v
for captureMap in leaftable
let handler = leaftable[captureMap]
let vs = projectMany v captureMap
m_handler handler vs
**Definition.** The procedure `adjustAssertion` updates the copy-count
associated with `v` in the given index, invoking callbacks as a
side-effect if this changes the observable contents of the
dataspace.
adjustAssertion :: Index -> V -> 𝐍 -> 1
adjustAssertion (cache, root) v delta =
let was_present = v in cache
cache[v] += delta
let is_present = v in cache
if not was_present and is_present then
modify root v add_cont add_leaf add_handler
if was_present and not is_present then
modify root v del_cont del_leaf del_handler
where
add_cont (cache, _) v = cache += v
add_leaf (leafcache, _) v = leafcache += v
add_handler (bag, f_table) vs =
let was_present = vs in bag
bag[vs] += 1
if not was_present then
for f in f_table
f "+" vs
del_cont (cache, _) v = cache -= v
del_leaf (leafcache, _) v = leafcache -= v
del_handler (bag, f_table) vs =
bag[vs] -= 1
for f in f_table
f "-" vs
**Definition.** The procedures `addAssertion` and `removeAssertion`
install and remove an assertion `v` into the given index,
respectively.
addAssertion :: Index -> V -> 1
addAssertion index v = adjustAssertion index v 1
removeAssertion :: Index -> V -> 1
removeAssertion index v = adjustAssertion index v -1
**Definition.** The procedure `sendMessage` delivers a message `v` to
event handlers in the given index.
sendMessage :: Index -> V -> 1
sendMessage (_, root) v =
modify root v send_cont send_leaf send_handler
where
send_cont _ _ = ()
send_leaf _ _ = ()
send (_, f_table) vs =
for f in f_table
f "!" vs
## Potential future optimizations
### Static analysis of messages and assertions
Static analysis of expressions under `(send! ...)` and `(assert ...)`
could cut out even more structural overhead.
For example, given interests
observe(message("actor1", capture()))
observe(message("actor2", capture()))
and a message
:: message(x, y)
static analysis could directly connect the sending site to a
hash-table lookup with `x` as the key, and invocation of the resulting
handlers with `y` as the argument. There would be no need to perform a
lookup based on the `message` constructor at runtime.
Similarly, given an interest
observe(present(capture()))
and an assertion
present(user)
static analysis could directly invoke handlers with `user` as the
argument, without needing to at runtime find the set of handlers
interested in the `present` constructor.
---
# TODO
- describe the cleanup function associated with a handler in the real implementation
- figure out and describe scoped assertions / visibility-restrictions
- check pop/index logic to ensure no off-by-ones
- rearrange to avoid the `dummy(...)` wrappers
<!--
# Example expansions
Starting at the top.
> (message-struct speak (who what))
> (assertion-struct present (who))
> (syntax->datum (expand-once '(on (message (speak $who $what)) (void))))
'(add-endpoint!
(current-facet)
""
#t
(lambda ()
(if #t
(values
(observe (speak (capture (discard)) (capture (discard))))
(skeleton-interest
(list struct:speak #f #f)
'()
(list)
'((0 0) (0 1))
(capture-facet-context
(lambda (op who what)
(when (eq? op '!)
(schedule-script!
#:priority
*normal-priority*
(current-actor)
(lambda () (begin/void-default (void)))))))
#f))
(values (void) #f))))
> (syntax->datum (expand-once '(on (asserted (present $who)) (void))))
'(add-endpoint!
(current-facet)
""
#t
(lambda ()
(if #t
(values
(observe (present (capture (discard))))
(skeleton-interest
(list struct:present #f)
'()
(list)
'((0 0))
(capture-facet-context
(lambda (op who)
(when (eq? op '+)
(schedule-script!
#:priority
*normal-priority*
(current-actor)
(lambda () (begin/void-default (void)))))))
#f))
(values (void) #f))))
> (syntax->datum (expand-once '(on (message `(speak ,$who ,$what)) (void))))
'(add-endpoint!
(current-facet)
""
#t
(lambda ()
(if #t
(values
(observe (list `speak (capture (discard)) (capture (discard))))
(skeleton-interest
(list 'list #f #f #f)
'((0 0))
(list `speak)
'((0 1) (0 2))
(capture-facet-context
(lambda (op who what)
(when (eq? op '!)
(schedule-script!
#:priority
*normal-priority*
(current-actor)
(lambda () (begin/void-default (void)))))))
#f))
(values (void) #f))))
> (syntax->datum (expand-once '(on (asserted `(present ,$who)) (void))))
'(add-endpoint!
(current-facet)
""
#t
(lambda ()
(if #t
(values
(observe (list `present (capture (discard))))
(skeleton-interest
(list 'list #f #f)
'((0 0))
(list `present)
'((0 1))
(capture-facet-context
(lambda (op who)
(when (eq? op '+)
(schedule-script!
#:priority
*normal-priority*
(current-actor)
(lambda () (begin/void-default (void)))))))
#f))
(values (void) #f))))
> (struct arp-query (protocol protocol-address interface-name link-address) #:prefab)
> (syntax->datum (expand-once '(on (asserted (observe (arp-query $protocol $protocol-address interface-name _))) (void))))
'(add-endpoint!
(current-facet)
""
#t
(lambda ()
(if #t
(values
(observe
(observe
(arp-query
(capture (discard))
(capture (discard))
interface-name
(discard))))
(skeleton-interest
(list struct:observe (list struct:arp-query #f #f #f #f))
'((0 0 2))
(list interface-name)
'((0 0 0) (0 0 1))
(capture-facet-context
(lambda (op protocol protocol-address)
(when (eq? op '+)
(schedule-script!
#:priority
*normal-priority*
(current-actor)
(lambda () (begin/void-default (void)))))))
#f))
(values (void) #f))))
> (struct tabular-layout (row col) #:prefab)
> (assertion-struct layout-solution (container-id spec size rectangle))
> (syntax->datum (expand-once '(on (asserted (observe (layout-solution container-id (tabular-layout $row $col) $size _))) (void))))
'(add-endpoint!
(current-facet)
""
#t
(lambda ()
(if #t
(values
(observe
(observe
(layout-solution
container-id
(tabular-layout (capture (discard)) (capture (discard)))
(capture (discard))
(discard))))
(skeleton-interest
(list
struct:observe
(list
struct:layout-solution
#f
(list struct:tabular-layout #f #f)
#f
#f))
'((0 0 0))
(list container-id)
'((0 0 1 0) (0 0 1 1) (0 0 2))
(capture-facet-context
(lambda (op row col size)
(when (eq? op '+)
(schedule-script!
#:priority
*normal-priority*
(current-actor)
(lambda () (begin/void-default (void)))))))
#f))
(values (void) #f))))
-->