A C implementation of a trie map that can hold (patterns over) ordered trees in its keys. Proof-of-concept for high-speed routing data structures for Syndicate.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Tony Garnock-Jones d0019134e5 Correct typo (?) in comment 4 years ago
Makefile Use $(CC) instead of gcc in makefile 6 years ago
README.md Initial commit 6 years ago
TODO.md Subtraction; relabelling; pretty output 6 years ago
critbit.c Remove uses of GCC nested function extension 6 years ago
critbit.h combine, union. WIP because reference-counting is broken 6 years ago
fasthash.c Initial commit 6 years ago
fasthash.h Initial commit 6 years ago
main.c RDTSC 6 years ago
route.c Remove uses of GCC nested function extension 6 years ago
route.h union_set + example; step, no example yet 6 years ago
routingspeed.rkt Adapt routingspeed.rkt to Syndicate 4 years ago
treetrie.c Even more pretty 6 years ago
treetrie.h Correct typo (?) in comment 4 years ago



An implementation of a trie map that can hold (patterns over) ordered trees in its keys.

Uses djb's "critbit" structure for each branch node.

There are two types to represent:

Trie = Ok
     | Tl Trie
     | Br Trie Branch

The first Trie argument to Br is the wildcard case.

Branch = Mt
       | Lf Trie Atom
       | Nd Int Branch Branch

The Branch type is the "critbit" type, extended with a value slot to make it into a map. We put the Atom last in a Lf, because we might want to use 32-bit pointers internal to our structures, but the hosted atoms might need 64-bit pointers.

The empty Trie, which we'll write empty is a cyclic structure (!):

empty = Br empty Mt

It's the only such structure, so we might want to actually represent it as a distinct constant instead.

Because Ok and Mt are nullary constants, we can represent them with special bit-patterns. Let's choose NULL for this. They don't need to be distinct, since they inhabit different types.

This leaves us with four cases to represent, two in each type. While we could represent these using a single bit, we will instead use two bits, for debuggability.


|         Trie pointer         00|   Br case
|        Branch pointer        00|

|         Trie pointer         01|   Tl case

|         Trie pointer         10|   Lf case
|         Atom pointer           |
|     (may be 64 bits long)      |

|              Int             11|   Nd case
|        Branch pointer        00|
|        Branch pointer        00|

We use a weak hash table to index all our objects, because we need to hash-cons them.

Perhaps it could be a Robin-Hood hashtable with backward shift deletion, http://codecapsule.com/2013/11/17/robin-hood-hashing-backward-shift-deletion/.

Hmm. For critbit to work, we need to be able to examine the bitpatterns of atoms. However, if those bitpatterns represent pointers to Racket-level objects, and Racket uses a moving collector, then not only will our atom pointers become out of date after a GC, even if we could update them we'd have to reindex each critbit tree.

So we probably want Atom to instead be some index into a different table. (Another level of indirection!) While any (referenced) Trie holds an indirect reference to a given Atom, the underlying Racket object should be preserved across collections. We will need to find the Atom for a given Racket object (based on equal? rather than eq?), and the Racket object for a given Atom (an easy table lookup). We'll want to not hold an Atom's Racket object longer than necessary.

It might be better to have the tag bits in each pointer to an object, rather than in the object header: the tag bits would then identify which of four separate heaps (each with its own object size) is being referred to.

Our data structures are never cyclic.

Supporting direct atoms is probably a sensible thing to do, so that e.g. fixnums map to Atom without having to take up space in the table. In fact, the host language should probably allocate and manage the Atom-to-host-object table itself! That way, our code can be completely ignorant of that kind of detail.