Hacker News new | ask | show | jobs
by lopatin 595 days ago
I think the competition for the future is between DuckDB and Polars. Will we stick with the DataFrame model, made feasible by Polars's lazy execution, or will we go with in-process SQL a la DuckDB? Personally I've been using DuckDB because I already know SQL (and DuckDB provides persistence if I need it) and don't want to learn a new DataFrame DSL but I'd love to hear other the experience of other people.
4 comments

I’d recommend using the polars SQL context manager if wanting to defer learning how to do everything through their API. The API is a big enough shift from pandas it took me a minute to figure out but I really enjoy having the choice to stay in dataframe methods or switch to SQL only transformations. It has global state too if that’s needed. I like that it isn’t a RDBMS but provides all of the SQL I use.

https://docs.pola.rs/api/python/stable/reference/sql/python_...

https://docs.pola.rs/api/python/stable/reference/sql/python_...

I really like the dataframe approach. I think it’s because I like REPL-driven-development where I can drop into the REPL and work through how to transform the data interactively.

To be fair, it can nearly always be done in SQL also (unless it’s ML or some Python-specific thing like that), but the SQL with nested queries and numerous CTEs is harder for me to wrap my brain around.

If I were betting, I’d pick DuckDB, because DuckDB seems more able to implement something Polars-like, than Polars is to implement something DuckDB-like.

I'm with you. I also like the IDE niceties like autocomplete and docs on hover that don't really work on SQL
I'm hoping someone writes a Python LSP that understands DuckDB SQL.

I use DuckDB and I typically write correct SQL, but having LSP assistance would greatly enhance my quality of life.

I've written a fair bit of PySpark code and Polars's syntax feels fairly similar, but it also offers a limited SQL dialect.
Although only experimental and probably off topic to the discussion, it's worth mentioning DuckDB also provides a Spark API implementation.

https://duckdb.org/docs/api/python/spark_api

And while on the subject of syntax, duckdb also has function chaining

https://duckdb.org/docs/sql/functions/overview.html#function...

I'm very split. There's a lot of interactive exploration and data transformations that SQL lends itself to poorly (try transposing in SQL - not fun!) but I really like the idea of data system that is language agnostic like DuckDB