Hacker News new | ask | show | jobs
by MrPowers 879 days ago
Delta Lake solves a lot of the Parquet limitations mentioned in this post. Disclosure: I work on the Delta Lake project.

Parquet files store metadata about row groups in the file footer. Delta Lake adds file-level metadata in the transaction log. So Delta Lake can perform file-level skipping before even opening any of the Parquet files to get the row-group metadata.

Delta Lake allows you to rearrange your data to improve file-skipping. You can Z Order by timestamp for time-series analyses.

Delta Lake also allows for schema evolution, so you can evolve the schema of your table over time.

This company may have a cool file format, but is it closed source? It seems like enterprises don't want to be locked into closed formats anymore.

2 comments

Wow ! I've been reading for a while from delta lake and Im interested in the company. Is there a chance to drop a CV for remote work (i am from spain).

The schema evolution is something that popped out in a water cooler conversation the other day in my team.

Can you z order in delta lake? I thought that was one of the features databricks had kept to themselves