|
|
|
|
|
by MrPowers
879 days ago
|
|
Delta Lake solves a lot of the Parquet limitations mentioned in this post. Disclosure: I work on the Delta Lake project. Parquet files store metadata about row groups in the file footer. Delta Lake adds file-level metadata in the transaction log. So Delta Lake can perform file-level skipping before even opening any of the Parquet files to get the row-group metadata. Delta Lake allows you to rearrange your data to improve file-skipping. You can Z Order by timestamp for time-series analyses. Delta Lake also allows for schema evolution, so you can evolve the schema of your table over time. This company may have a cool file format, but is it closed source? It seems like enterprises don't want to be locked into closed formats anymore. |
|
The schema evolution is something that popped out in a water cooler conversation the other day in my team.