Hacker News new | ask | show | jobs
by simicd 545 days ago
From what I understood the article refers to the point that DuckDB doesn't provide its own dataframe API, meaning a way to express SQL queries in Python classes/functions.

The link you shared shows how DuckDB can run SQL queries on a pandas dataframe (e.g. `duckdb.query("<SQL query>")`. The SQL query in this case is a string. A dataframe API would allow you to write it completely in Python. An example for this would be polars dataframes (`df.select(pl.col("...").alias("...")).filter(pl.col("...") > x)`).

Dataframe APIs benefit from autocompletion, error handling, syntax highlighting, etc. that the SQL strings wouldn't. Please let me know if I missed something from the blog post you linked!

3 comments

Author here: that’s exactly what I was trying to communicate but you said it better :)
There is a Spark API[1] being built using their Relational API[2].

Progress is being tracked on Github Discussions[3].

[1]: https://duckdb.org/docs/api/python/spark_api.html

[2]: https://duckdb.org/docs/api/python/relational_api.html

[3]: https://github.com/duckdb/duckdb/discussions/14525

Very cool! This seems like fantastic functionality and would make it super easy to migrate small Spark workloads to DuckDB :)
For non trivial queries I write them in a separate SQL file where I get the benefit of syntax highlighting, auto formatting and error checks.

There may be another benefit: a lot of LLMs are getting good at how do I do X in Duckdb.

Your point about SQL strings vs more strongly typed DF APIs stands.

However it's somewhat weakened by the possibility that some parts of the SQL string are resolved by the surrounding python context.