| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by alentred 906 days ago

I am very excited about Iceberg specifically (because open-source), but the last time I looked into it the only implementation was a Spark library, and Trino's (formerly Presto, an SQL engine) Iceberg connector had a hard dependency on Hive! It is like the entire industry had a hard time divorcing its MapReduce, Hive, and dare I to say Spark, legacy.

I didn't look into Iceberg since, but plan to, and I am really looking forward for this to develop. We have the tools and the compute power today to deal with data without legacy tech, and not all data is big data either. Consequently "data engineering", thankfully, resembles the regular back-end development more and more, with its regular development practices being put in place.

So, here is to the hope of having a pure Python Iceberg lib some day very soon!

2 comments

electrum 905 days ago

Trino no longer depends on Hadoop/Hive for any of its data lake connectors. Removing that dependency was a huge effort.

link

392 906 days ago

Same. I wasted a month or so of off time trying to get that old stack to work well enough to let me just insert data, left unhappy. Had Databend up and running in an hour, figured it will get easier to do it right in the future once there's a Rust impl (for portability vs Java/Hive)

link