| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by seddonm1 2250 days ago
	If you are working within the Apache Spark ecosystem you can us DeltaLake https://delta.io/ to create 'merge' datasets which are transactional, versioned and allow time travel by both version number and timestamp.

1 comments

jamesblonde 2250 days ago

Another alternative to Deltalake is Apache Hudi, which also includes bloom filters for indexing time-travel queries (efficiently exclude any files given the supplied time constraint). Z-ordered indexing in Deltalake is not available yet in open-source deltalake, only in Databricks version.

link