| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wenc 2250 days ago

Ah now I understand!

So for most analytic workloads, typically a columnstore db is used due to the need for performance and advanced SQL features (windowing functions) for complex analytic queries -- which I don't expect Dolt to replace. Which means if we wanted to use Dolt's features, we would have to continuously ETL the data into Dolt, which would entail mirroring the entire database (or at least the parts we want to version control).

Dolt essentially becomes a derived database specifically used for versioning. I see how this might work for some use cases.

1 comments

seddonm1 2250 days ago

If you are working within the Apache Spark ecosystem you can us DeltaLake https://delta.io/ to create 'merge' datasets which are transactional, versioned and allow time travel by both version number and timestamp.

link

jamesblonde 2250 days ago

Another alternative to Deltalake is Apache Hudi, which also includes bloom filters for indexing time-travel queries (efficiently exclude any files given the supplied time constraint). Z-ordered indexing in Deltalake is not available yet in open-source deltalake, only in Databricks version.

link