Hacker News new | ask | show | jobs
by mytherin 1615 days ago
You can use DuckDB as a processing engine on top of Pandas [1], while continuing to use Pandas as a data storage/data interchange format.

[1] https://duckdb.org/2021/05/14/sql-on-pandas.html

1 comments

That's what I do at $dayjob whenever I have to do windowing &c. Figuring out this stuff in Pandas is a waste of time. Before I discovered DuckDB, I would re-learn the API every damn time. I came up with a little utility function, which you can implement yourself :)

``` def sqldf(df: DataFrame, query: str) -> DataFrame: ... ```

Years of unpicking others use of Rs sqldf (which by default used to copy the entire data frame to a SQLite db, run the query, the copy the result set back) when they complained their R code was to slow has taught me a visceral, negative to the name and pattern.

Glad to to see duckDB delivering, finally, on the promise of running SQL against in-memory dataframes

TIL there's an actual 'botched' library with the same name; I actually came up with it independently on a lazy office afternoon :^)