# TreeTrie An implementation of a trie map that can hold (patterns over) ordered trees in its keys. Uses djb's "critbit" structure for each branch node. There are two types to represent: Trie = Ok | Tl Trie | Br Trie Branch The first `Trie` argument to `Br` is the wildcard case. Branch = Mt | Lf Trie Atom | Nd Int Branch Branch The `Branch` type is the "critbit" type, extended with a value slot to make it into a map. We put the `Atom` last in a `Lf`, because we might want to use 32-bit pointers internal to our structures, but the hosted atoms might need 64-bit pointers. The empty `Trie`, which we'll write `empty` is a cyclic structure (!): empty = Br empty Mt It's the only such structure, so we might want to actually represent it as a distinct constant instead. Because `Ok` and `Mt` are nullary constants, we can represent them with special bit-patterns. Let's choose `NULL` for this. They don't need to be distinct, since they inhabit different types. This leaves us with four cases to represent, two in each type. While we could represent these using a single bit, we will instead use two bits, for debuggability. 33222222222211111111110000000000 10987654321098765432109876543210 |--------------------------------| |--------------------------------| | Trie pointer 00| Br case |--------------------------------| | Branch pointer 00| |--------------------------------| |--------------------------------| | Trie pointer 01| Tl case |--------------------------------| |--------------------------------| | Trie pointer 10| Lf case |--------------------------------| | Atom pointer | | (may be 64 bits long) | |--------------------------------| |--------------------------------| | Int 11| Nd case |--------------------------------| | Branch pointer 00| |--------------------------------| | Branch pointer 00| |--------------------------------| We use a weak hash table to index all our objects, because we need to hash-cons them. Perhaps it could be a Robin-Hood hashtable with backward shift deletion, . Hmm. For critbit to work, we need to be able to examine the bitpatterns of atoms. However, if those bitpatterns represent pointers to Racket-level objects, and Racket uses a moving collector, then not only will our atom pointers become out of date after a GC, even if we could update them we'd have to reindex each critbit tree. So we probably want `Atom` to instead be some index into a *different* table. (Another level of indirection!) While any (referenced) `Trie` holds an indirect reference to a given `Atom`, the underlying Racket object should be preserved across collections. We will need to find the `Atom` for a given Racket object (based on `equal?` rather than `eq?`), and the Racket object for a given `Atom` (an easy table lookup). We'll want to not hold an `Atom`'s Racket object longer than necessary. It might be better to have the tag bits in each pointer to an object, rather than in the object header: the tag bits would then identify which of four separate heaps (each with its own object size) is being referred to. Our data structures are never cyclic. Supporting direct atoms is probably a sensible thing to do, so that e.g. fixnums map to `Atom` without having to take up space in the table. In fact, the host language should probably allocate and manage the `Atom`-to-host-object table itself! That way, our code can be completely ignorant of that kind of detail.