Hacker News new | ask | show | jobs
by gorset 5226 days ago
Cool, WT and succinct data structures are definitely an interesting field these days.

I'm primarily using as a compact order preserving structure to support fast lookups for both (token -> position) and (position -> token). I'm using an updatable variant to support O(1) appends, and then use it for realtime indexing/compression. I probably should do a blog post at one point...

Would be interesting to hear about faster ways to do rank/select without using support structures... I'll keep an eye on your blog :-)

2 comments

> I'm primarily using as a compact order preserving structure to support fast lookups for both (token -> position) and (position -> token). I'm using an updatable variant to support O(1) appends, and then use it for realtime indexing/compression. I probably should do a blog post at one point...

That's very interesting, are you using raw or compressed bitmaps (such as RLE or RRR)? A limitation of (updatable) wavelet trees is that you need to know in advance the set of symbols that you will see (otherwise you would need to change the tree structure). Is it a problem for you? I wrote a paper with a solution to this problem, I can't share it until mid march, let me know if you are interested.

My solution is to use a fixed tree with bloated leaf nodes (use more bits per element in the leaf node). A high number of nodes with a low number of elements causes a high overhead, so I simply rebuild the tree at a few given thresholds to keep the overhad and the cost of the bloated leaf nodes down. We can discuss further by email, if you want. I'd love to read your paper! :-)
I'm interested in the paper too of course :D
Yes you should do a blog post :) What is your blog address?
http://erik.gorset.no/

not much content, though..

Thanks! I'll keep visiting ^^ looks interesting as it is