Hacker News new | ask | show | jobs
by bane 4051 days ago
> boiled each pattern in the trie down to its most unique dword or qword, and only attempted a full match after an initial hit

Oh that's interesting - like a very fast cmp before bothering to check the rest of the branches.

When you say pattern are you talking about specific letters/morphemes or subsequence? And were you pulling the [dq]words from the initial pattern or generating them some otherway (through some kind of hash or encoding function?).

1 comments

Sorry, should've given more context. In this case the trie was for Aho-Corasick, so we built it from the most unique subsequences. Since there was no need to add or remove patterns at runtime, the subsequences were precomputed and the trie built server-side, then sent serialized down to the application.

On a different occasion I tried a few kinds of entropy coding, but their value was pretty limited, and obviously not applicable if you need to operate on a stream with Aho. In that case I only needed memory-optimized set membership testing, though, so ended up going with a Golomb-compressed set from pgs. 7-8 of [0].

[0] http://algo2.iti.kit.edu/singler/publications/cacheefficient...