Hacker News new | ask | show | jobs
by exergy 646 days ago
there is Ibis[0] as a fairly mature package. They recently adopted duckdb as the default execution engine and it can give you a nice python dataframe API ontop of duckdb, with hot-swappability towards heavier engines.

With tools like this providing a comprehensive python API and the ability to always fall back to raw SQL, i am not sure DuckDB devs should focus on the python API at all beyond basic (to_table, from_table) features.

Impressive progress and a real chance to shake up the data tool market, but still a way to go: There is is still much to do especially on large table formats (iceberg/delta) and memory management when running on bigger boxes on cloud. Eg the elusive "Failed to allocate ..." bug[1] is an inhibitor to the claim that big data is dead[2]. As it is, we tried and abandoned DuckDB as a cheaper replacement for some databricks batch jobs.

[0] https://github.com/ibis-project/ibis [1] https://github.com/duckdb/duckdb/issues/12667, https://github.com/duckdb/duckdb/issues/9880, https://github.com/duckdb/duckdb/issues/12528 [2] https://motherduck.com/blog/big-data-is-dead/