critbit.c | ||
critbit.h | ||
fasthash.c | ||
fasthash.h | ||
main.c | ||
Makefile | ||
README.md | ||
route.c | ||
route.h | ||
routingspeed.rkt | ||
TODO.md | ||
treetrie.c | ||
treetrie.h |
TreeTrie
An implementation of a trie map that can hold (patterns over) ordered trees in its keys.
Uses djb's "critbit" structure for each branch node.
There are two types to represent:
Trie = Ok
| Tl Trie
| Br Trie Branch
The first Trie
argument to Br
is the wildcard case.
Branch = Mt
| Lf Trie Atom
| Nd Int Branch Branch
The Branch
type is the "critbit" type, extended with a value slot to
make it into a map. We put the Atom
last in a Lf
, because we might
want to use 32-bit pointers internal to our structures, but the hosted
atoms might need 64-bit pointers.
The empty Trie
, which we'll write empty
is a cyclic structure (!):
empty = Br empty Mt
It's the only such structure, so we might want to actually represent it as a distinct constant instead.
Because Ok
and Mt
are nullary constants, we can represent them
with special bit-patterns. Let's choose NULL
for this. They don't
need to be distinct, since they inhabit different types.
This leaves us with four cases to represent, two in each type. While we could represent these using a single bit, we will instead use two bits, for debuggability.
33222222222211111111110000000000
10987654321098765432109876543210
|--------------------------------|
|--------------------------------|
| Trie pointer 00| Br case
|--------------------------------|
| Branch pointer 00|
|--------------------------------|
|--------------------------------|
| Trie pointer 01| Tl case
|--------------------------------|
|--------------------------------|
| Trie pointer 10| Lf case
|--------------------------------|
| Atom pointer |
| (may be 64 bits long) |
|--------------------------------|
|--------------------------------|
| Int 11| Nd case
|--------------------------------|
| Branch pointer 00|
|--------------------------------|
| Branch pointer 00|
|--------------------------------|
We use a weak hash table to index all our objects, because we need to hash-cons them.
Perhaps it could be a Robin-Hood hashtable with backward shift deletion, http://codecapsule.com/2013/11/17/robin-hood-hashing-backward-shift-deletion/.
Hmm. For critbit to work, we need to be able to examine the bitpatterns of atoms. However, if those bitpatterns represent pointers to Racket-level objects, and Racket uses a moving collector, then not only will our atom pointers become out of date after a GC, even if we could update them we'd have to reindex each critbit tree.
So we probably want Atom
to instead be some index into a different
table. (Another level of indirection!) While any (referenced) Trie
holds an indirect reference to a given Atom
, the underlying Racket
object should be preserved across collections. We will need to find
the Atom
for a given Racket object (based on equal?
rather than
eq?
), and the Racket object for a given Atom
(an easy table
lookup). We'll want to not hold an Atom
's Racket object longer than
necessary.
It might be better to have the tag bits in each pointer to an object, rather than in the object header: the tag bits would then identify which of four separate heaps (each with its own object size) is being referred to.
Our data structures are never cyclic.
Supporting direct atoms is probably a sensible thing to do, so that
e.g. fixnums map to Atom
without having to take up space in the
table. In fact, the host language should probably allocate and manage
the Atom
-to-host-object table itself! That way, our code can be
completely ignorant of that kind of detail.