| Does anyone know of an embedded key-value store that does do versioning/snapshots, but doesn’t bother with cryptographic integrity (and so gets better OLAP performance than a Merkle-tree-based implementation)? My use-case is a system that serves as an OLAP data warehouse of representations of how another system’s state looked at various points in history. You’d open a handle against the store, passing in a snapshot version; and then do OLAP queries against that snapshot. Things that make this a hard problem: The dataset is too large to just store the versions as independent copies; so it really needs some level of data-sharing between the snapshots. But it also needs to be fast for reads, especially whole-bucket reads—it’s an OLAP data warehouse. Merkle-tree-based designs really suck for doing indexed table scans. But, things that can be traded off: there’d only need to be one (trusted) writer, who would just be batch-inserting new snapshots generated by reducing over a CQRS/ES event stream. It’d be that (out-of-band) event stream that’d be the canonical, integrity-verified, etc. representation for all this data. These CQRS state-aggregate snapshots would just be a cache. If the whole thing got corrupted, I could just throw it all away and regenerate it from the CQRS/ES event stream; or, hopefully, “rewind” the database back to the last-known-good commit (i.e. purge all snapshots above that one) and then regenerate only the rest from the event stream. I’m not personally aware of anything that targets exactly this use case. I’m working on something for it myself right now. Two avenues I’m looking into: • something that acts like a hybrid between LMDB and btrfs (i.e. a B-tree with copy-on-write ref-counted pages shared between snapshots, where those snapshots appear as B-tree nodes themselves) • “keyframe” snapshots as regular independent B-trees, maybe relying on L2ARC-like block-level dedup between them; “interstitial” snapshots as on-disk HAMT ‘overlays’ of the last keyframe B-tree, that share nodes with other on-disk HAMTs, but only within their “generation” (i.e. up to the next keyframe), such that they can all be rewritten/compacted/finalized once the next keyframe arrives, or maybe even converted into “B-frames” that have forward-references to data embedded in the next keyframe. |
[0]: https://makedist.com/projects/cruzdb/