103 lines
3.7 KiB
Markdown
103 lines
3.7 KiB
Markdown
|
# TreeTrie
|
||
|
|
||
|
An implementation of a trie map that can hold (patterns over) ordered
|
||
|
trees in its keys.
|
||
|
|
||
|
Uses djb's "critbit" structure for each branch node.
|
||
|
|
||
|
There are two types to represent:
|
||
|
|
||
|
Trie = Ok
|
||
|
| Tl Trie
|
||
|
| Br Trie Branch
|
||
|
|
||
|
The first `Trie` argument to `Br` is the wildcard case.
|
||
|
|
||
|
Branch = Mt
|
||
|
| Lf Trie Atom
|
||
|
| Nd Int Branch Branch
|
||
|
|
||
|
The `Branch` type is the "critbit" type, extended with a value slot to
|
||
|
make it into a map. We put the `Atom` last in a `Lf`, because we might
|
||
|
want to use 32-bit pointers internal to our structures, but the hosted
|
||
|
atoms might need 64-bit pointers.
|
||
|
|
||
|
The empty `Trie`, which we'll write `empty` is a cyclic structure (!):
|
||
|
|
||
|
empty = Br empty Mt
|
||
|
|
||
|
It's the only such structure, so we might want to actually represent
|
||
|
it as a distinct constant instead.
|
||
|
|
||
|
Because `Ok` and `Mt` are nullary constants, we can represent them
|
||
|
with special bit-patterns. Let's choose `NULL` for this. They don't
|
||
|
need to be distinct, since they inhabit different types.
|
||
|
|
||
|
This leaves us with four cases to represent, two in each type. While
|
||
|
we could represent these using a single bit, we will instead use two
|
||
|
bits, for debuggability.
|
||
|
|
||
|
33222222222211111111110000000000
|
||
|
10987654321098765432109876543210
|
||
|
|--------------------------------|
|
||
|
|
||
|
|--------------------------------|
|
||
|
| Trie pointer 00| Br case
|
||
|
|--------------------------------|
|
||
|
| Branch pointer 00|
|
||
|
|--------------------------------|
|
||
|
|
||
|
|--------------------------------|
|
||
|
| Trie pointer 01| Tl case
|
||
|
|--------------------------------|
|
||
|
|
||
|
|--------------------------------|
|
||
|
| Trie pointer 10| Lf case
|
||
|
|--------------------------------|
|
||
|
| Atom pointer |
|
||
|
| (may be 64 bits long) |
|
||
|
|--------------------------------|
|
||
|
|
||
|
|--------------------------------|
|
||
|
| Int 11| Nd case
|
||
|
|--------------------------------|
|
||
|
| Branch pointer 00|
|
||
|
|--------------------------------|
|
||
|
| Branch pointer 00|
|
||
|
|--------------------------------|
|
||
|
|
||
|
We use a weak hash table to index all our objects, because we need to
|
||
|
hash-cons them.
|
||
|
|
||
|
Perhaps it could be a Robin-Hood hashtable with backward shift
|
||
|
deletion,
|
||
|
<http://codecapsule.com/2013/11/17/robin-hood-hashing-backward-shift-deletion/>.
|
||
|
|
||
|
Hmm. For critbit to work, we need to be able to examine the
|
||
|
bitpatterns of atoms. However, if those bitpatterns represent pointers
|
||
|
to Racket-level objects, and Racket uses a moving collector, then not
|
||
|
only will our atom pointers become out of date after a GC, even if we
|
||
|
could update them we'd have to reindex each critbit tree.
|
||
|
|
||
|
So we probably want `Atom` to instead be some index into a *different*
|
||
|
table. (Another level of indirection!) While any (referenced) `Trie`
|
||
|
holds an indirect reference to a given `Atom`, the underlying Racket
|
||
|
object should be preserved across collections. We will need to find
|
||
|
the `Atom` for a given Racket object (based on `equal?` rather than
|
||
|
`eq?`), and the Racket object for a given `Atom` (an easy table
|
||
|
lookup). We'll want to not hold an `Atom`'s Racket object longer than
|
||
|
necessary.
|
||
|
|
||
|
It might be better to have the tag bits in each pointer to an object,
|
||
|
rather than in the object header: the tag bits would then identify
|
||
|
which of four separate heaps (each with its own object size) is being
|
||
|
referred to.
|
||
|
|
||
|
Our data structures are never cyclic.
|
||
|
|
||
|
Supporting direct atoms is probably a sensible thing to do, so that
|
||
|
e.g. fixnums map to `Atom` without having to take up space in the
|
||
|
table. In fact, the host language should probably allocate and manage
|
||
|
the `Atom`-to-host-object table itself! That way, our code can be
|
||
|
completely ignorant of that kind of detail.
|