Hacker News new | ask | show | jobs
by zzzcpan 2614 days ago
You don't actually need to do asynchronous background compaction at all. You can do compaction whenever in small incremental steps not causing any spikes in read or write latencies. Just spreading it across all writes gets you slightly slower, but latency capped writes. It's unfortunate that LevelDB popularized this compaction in a thread idea. It's pretty bad one.
2 comments

Good catch :-) right, but still merging/compaction work has to be done. Maybe too much, if you just need to fetch a value by its key and thus just an equality scan is needed (no range scans or other comparisons). For the latter case I've implemented an AVL-tree, which is also versioned and stored in our record pages and best read fully in-memory (but doesn't have to). For sure there are plenty of optimizations and for instance also spatio-temporal indexes or full-text indexes possible, but I guess first looking into cost-based rewrite rules for the query compiler and replication/partitioning for horizontal scaling. Too many ideas I guess ;-) but the best would be to have a great open source community :-)
would you know why LevelDB choose compaction in thread vs the method you are describing.