Hacker News new | ask | show | jobs
by FreakLegion 4046 days ago
Indeed there are a lot of good optimization options for tries.

At a little bit of a speed penalty you can use a bitmap for navigation instead of pointers, which obliterates the memory footprint. In some use cases the result is actually smaller than the equivalent Bloom filter with a sane FP rate (though generally not smaller than a Golomb-compressed set or cuckoo filter).

A compromise between the speed and memory bloat of pointers is to use Radix/Patricia tries, which are a solid go-to option. In some cases they'll be slower than naive tries, but in others they'll be faster.

The fastest and most memory-efficient implementation I've seen, though, boiled each pattern in the trie down to its most unique dword or qword, and only attempted a full match after an initial hit. Locality of reference can work wonders here -- we got a 20x speedup over an already very fast naive trie this way.

1 comments

> boiled each pattern in the trie down to its most unique dword or qword, and only attempted a full match after an initial hit

Oh that's interesting - like a very fast cmp before bothering to check the rest of the branches.

When you say pattern are you talking about specific letters/morphemes or subsequence? And were you pulling the [dq]words from the initial pattern or generating them some otherway (through some kind of hash or encoding function?).

Sorry, should've given more context. In this case the trie was for Aho-Corasick, so we built it from the most unique subsequences. Since there was no need to add or remove patterns at runtime, the subsequences were precomputed and the trie built server-side, then sent serialized down to the application.

On a different occasion I tried a few kinds of entropy coding, but their value was pretty limited, and obviously not applicable if you need to operate on a stream with Aho. In that case I only needed memory-optimized set membership testing, though, so ended up going with a Golomb-compressed set from pgs. 7-8 of [0].

[0] http://algo2.iti.kit.edu/singler/publications/cacheefficient...