Hacker News new | ask | show | jobs
by seddonm1 2250 days ago
If you are working within the Apache Spark ecosystem you can us DeltaLake https://delta.io/ to create 'merge' datasets which are transactional, versioned and allow time travel by both version number and timestamp.
1 comments

Another alternative to Deltalake is Apache Hudi, which also includes bloom filters for indexing time-travel queries (efficiently exclude any files given the supplied time constraint). Z-ordered indexing in Deltalake is not available yet in open-source deltalake, only in Databricks version.