Hacker News new | ask | show | jobs
by glogla 1681 days ago
The open source Delta is not a replacement for the real thing - they did not include features like optimizing small files (small file problem is well known in big data, and much more of a problem once streaming gets involved) and others. It is more of a demo of the real thing. Which does not stop them from repeating everywhere how open they are, of course.

EDIT: the delta also still keeps partitioning information in the hive metastore, while iceberg keeps it in storage, making it a far superior design. Adopting iceberg is harder due to third party tools like AWS Redshift not supporting it - you have to go 100 % of the way.

1 comments

>the delta also still keeps partitioning information in the hive metastore, while iceberg keeps it in storage, making it a far superior design.

Check out https://github.com/delta-io/delta/blob/3ffb30d86c6acda9b59b9... when you get a chance. You don't need hive metastore to query delta tables since all metadata for a Delta table is stored alongside the data

>they did not include features like optimizing small files

For optimizing small files, you could run https://docs.delta.io/latest/best-practices.html#compact-fil...