A C implementation of a trie map that can hold (patterns over) ordered trees in its keys. Proof-of-concept for high-speed routing data structures for Syndicate.

historical proof-of-concept

Go to file

Tony Garnock-Jones d0019134e5 Correct typo (?) in comment		2017-06-14 10:36:21 +01:00
Makefile	Toy routing speed measurement	2015-07-19 11:48:14 -04:00
README.md	Initial commit	2015-06-29 20:39:49 -04:00
TODO.md	Subtraction; relabelling; pretty output	2015-07-13 19:57:06 -04:00
critbit.c	Remove uses of GCC nested function extension	2015-07-24 13:20:26 -04:00
critbit.h	combine, union. WIP because reference-counting is broken	2015-07-12 17:46:08 -04:00
fasthash.c	Initial commit	2015-06-29 20:39:49 -04:00
fasthash.h	Initial commit	2015-06-29 20:39:49 -04:00
main.c	RDTSC	2015-08-03 11:14:35 -04:00
route.c	Remove uses of GCC nested function extension	2015-07-24 13:20:26 -04:00
route.h	union_set + example; step, no example yet	2015-07-15 18:02:55 -04:00
routingspeed.rkt	Adapt routingspeed.rkt to Syndicate	2017-05-14 16:04:59 -04:00
treetrie.c	Even more pretty	2015-07-13 20:04:42 -04:00
treetrie.h	Correct typo (?) in comment	2017-06-14 10:36:21 +01:00

README.md

TreeTrie

An implementation of a trie map that can hold (patterns over) ordered trees in its keys.

Uses djb's "critbit" structure for each branch node.

There are two types to represent:

Trie = Ok
     | Tl Trie
	 | Br Trie Branch

The first Trie argument to Br is the wildcard case.

Branch = Mt
       | Lf Trie Atom
	   | Nd Int Branch Branch

The Branch type is the "critbit" type, extended with a value slot to make it into a map. We put the Atom last in a Lf, because we might want to use 32-bit pointers internal to our structures, but the hosted atoms might need 64-bit pointers.

The empty Trie, which we'll write empty is a cyclic structure (!):

empty = Br empty Mt

It's the only such structure, so we might want to actually represent it as a distinct constant instead.

Because Ok and Mt are nullary constants, we can represent them with special bit-patterns. Let's choose NULL for this. They don't need to be distinct, since they inhabit different types.

This leaves us with four cases to represent, two in each type. While we could represent these using a single bit, we will instead use two bits, for debuggability.

 33222222222211111111110000000000
 10987654321098765432109876543210
|--------------------------------|

|--------------------------------|
|         Trie pointer         00|   Br case
|--------------------------------|
|        Branch pointer        00|
|--------------------------------|

|--------------------------------|
|         Trie pointer         01|   Tl case
|--------------------------------|

|--------------------------------|
|         Trie pointer         10|   Lf case
|--------------------------------|
|         Atom pointer           |
|     (may be 64 bits long)      |
|--------------------------------|

|--------------------------------|
|              Int             11|   Nd case
|--------------------------------|
|        Branch pointer        00|
|--------------------------------|
|        Branch pointer        00|
|--------------------------------|

We use a weak hash table to index all our objects, because we need to hash-cons them.

Perhaps it could be a Robin-Hood hashtable with backward shift deletion, http://codecapsule.com/2013/11/17/robin-hood-hashing-backward-shift-deletion/.

Hmm. For critbit to work, we need to be able to examine the bitpatterns of atoms. However, if those bitpatterns represent pointers to Racket-level objects, and Racket uses a moving collector, then not only will our atom pointers become out of date after a GC, even if we could update them we'd have to reindex each critbit tree.

So we probably want Atom to instead be some index into a different table. (Another level of indirection!) While any (referenced) Trie holds an indirect reference to a given Atom, the underlying Racket object should be preserved across collections. We will need to find the Atom for a given Racket object (based on equal? rather than eq?), and the Racket object for a given Atom (an easy table lookup). We'll want to not hold an Atom's Racket object longer than necessary.

It might be better to have the tag bits in each pointer to an object, rather than in the object header: the tag bits would then identify which of four separate heaps (each with its own object size) is being referred to.

Our data structures are never cyclic.

Supporting direct atoms is probably a sensible thing to do, so that e.g. fixnums map to Atom without having to take up space in the table. In fact, the host language should probably allocate and manage the Atom-to-host-object table itself! That way, our code can be completely ignorant of that kind of detail.