Hacker News new | ask | show | jobs
by slap_shot 763 days ago
There isn't a winner and there likely won't be one (at least not for a long time). Tabular will likely be acquired by Snowflake and the two industry behemoths now back their own formats, and each will treat their own as a first class citizen.
1 comments

Agreed, this is why we want to support both. Maybe even Apache Hudi down the line. But I hope the industry converges to a main standard rather than Snowflake/Databricks fighting for their own formats. They can differentiate on much more meaningful features
There’s a lot of interesting work happening in this area (see: XTable).

We are building a Python distributed query engine, and share a lot of the same frustrations… in fact until quite recently most of the table formats only had JVM client libraries and so integrating it purely natively with Daft was really difficult.

We finally managed to get read integrations across Iceberg/DeltaLake/Hudi recently as all 3 now have Python/Rust-facing APIs. Funny enough, the only non-JVM implementation of Hudi was contributed by the Hudi team and currently still lives in our repo :D (https://github.com/Eventual-Inc/Daft/tree/main/daft/hudi/pyh...)

It’s still the case that these libraries still lag behind their JVM counterparts though, so it’s going to be a while before we see full support across the full featureset of each table format. But we’re definitely seeing a large appetite for working with table formats outside of the JVM ecosystem (e.g. in Python and Rust)

Are you using the iceberg-rust crate for Rust? It's a rather young project, have you found it sufficient for your needs (if using)?
We're actually using pyiceberg to retrieve metadata! All our IO and decoding happens in the rust side once the data has been passthrough.

We expose something called a ScanOperator which allows integration into various catalogs through a thin layer that exposes ScanTasks.

Iceberg's impl: https://github.com/Eventual-Inc/Daft/blob/416009138359a9d410...