|
|
|
|
|
by phoobahr
1165 days ago
|
|
In addition to this here's one really specific case: ever had a pandas groupby().apply() that took forever often mostly re-aggregating after the apply? With columnar data DuckDuckGo is somuchfaster at this. For one of my projects I have what sounds like a dumb workflow:
- JSON api fetches get cached in sqlite3
- Parsing the JSON gets done with sqlite3 JSON operators (Fast! Fault tolerant! Handles NULLs nicely! Fast!!).
- Collating data later gets queried with duckdb - everything gets munged and aggregated into the shape I want it and is persisted in parquet files
- When it's time to consume it duckdb queries my various sources, does my (used to be expensive) groupbys onthefly and spits out pandas data frames
- Lastly those data frames are small-ish, tidy and flexible So yeah, on paper it sounds like these 3 libraries overlap too much to be use at the same time but in practice they can each have their place and interact well. |
|