Hacker News new | ask | show | jobs
by meitham 1124 days ago
I have a large number of small and frequent batches, think of it like discrete ETL, where each process operates on a pandas DataFrame. This frame ends up being written to disc as parquet and immediately followed by creating a DuckDB that imports the parquet. The duckdb file from then on will only be opened for read, no further writes.

I use a python odata library to convert user queries in rest to a SQL similar to Postgres and run it on these duckdb for applying any filters where needed.