Hacker News new | ask | show | jobs
by ot 5221 days ago
> I'm primarily using as a compact order preserving structure to support fast lookups for both (token -> position) and (position -> token). I'm using an updatable variant to support O(1) appends, and then use it for realtime indexing/compression. I probably should do a blog post at one point...

That's very interesting, are you using raw or compressed bitmaps (such as RLE or RRR)? A limitation of (updatable) wavelet trees is that you need to know in advance the set of symbols that you will see (otherwise you would need to change the tree structure). Is it a problem for you? I wrote a paper with a solution to this problem, I can't share it until mid march, let me know if you are interested.

2 comments

My solution is to use a fixed tree with bloated leaf nodes (use more bits per element in the leaf node). A high number of nodes with a low number of elements causes a high overhead, so I simply rebuild the tree at a few given thresholds to keep the overhad and the cost of the bloated leaf nodes down. We can discuss further by email, if you want. I'd love to read your paper! :-)
I'm interested in the paper too of course :D