| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lichtenberger 2614 days ago
	If you just need to fetch values by a key, for the main storage (might be even as simple as generated by a sequence generator) you can even avoid the asynchronous background compaction overhead and thus unpredictable read- or write-peaks and so on by hashing the keys if it's not already an integer/long based identifier: Basically storing a persistent (both on-disk persistence as well as in the functional sense immutable) hash array based trie. This can easily be extended to store a new revision through copy-on-write semantics. Instead of storing whole page snapshots however, storage advanced now permit fine granular access to your data. Thus you can basically apply lessons learned from backup systems to version the data pages itself and even improve on that. Disclaimer: I'm the author of a recent article about a free open source storage system I'm maintaining, which versions data at it's very core: "Why Copy-on-Write Semantics and Node-Level-Versioning are key to Efficient Snapshots": https://hackernoon.com/sirix-io-why-copy-on-write-semantics-...

3 comments

zzzcpan 2614 days ago

You don't actually need to do asynchronous background compaction at all. You can do compaction whenever in small incremental steps not causing any spikes in read or write latencies. Just spreading it across all writes gets you slightly slower, but latency capped writes. It's unfortunate that LevelDB popularized this compaction in a thread idea. It's pretty bad one.

link

lichtenberger 2614 days ago

Good catch :-) right, but still merging/compaction work has to be done. Maybe too much, if you just need to fetch a value by its key and thus just an equality scan is needed (no range scans or other comparisons). For the latter case I've implemented an AVL-tree, which is also versioned and stored in our record pages and best read fully in-memory (but doesn't have to). For sure there are plenty of optimizations and for instance also spatio-temporal indexes or full-text indexes possible, but I guess first looking into cost-based rewrite rules for the query compiler and replication/partitioning for horizontal scaling. Too many ideas I guess ;-) but the best would be to have a great open source community :-)

link

dominotw 2614 days ago

would you know why LevelDB choose compaction in thread vs the method you are describing.

link

eeZah7Ux 2614 days ago

Any though on https://en.wikipedia.org/wiki/NILFS ?

link

lichtenberger 2614 days ago

Haven't heard of, but the storage system is heavily inspired by ZFS and by putting some of the ideas (plus adding our own obviously ;)) to the sub-file level: https://kops.uni-konstanz.de/bitstream/handle/123456789/2769...

link

eeZah7Ux 2614 days ago

I'd like to see a comparison with using mmap and letting the kernel do the paging.

link

lichtenberger 2614 days ago

Basically, I'd like to provide the I/O layer with memory mapped file regions, such that it's simply a configuration option if you use the RandomAccessFile implementation or maybe one based on memory mapped files. I think for the JVM there's Chronicle. Thanks so much for asking :-) and by the way any contribution would be the best I can hope for

link