| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by meitham 1124 days ago
	I have a large number of small and frequent batches, think of it like discrete ETL, where each process operates on a pandas DataFrame. This frame ends up being written to disc as parquet and immediately followed by creating a DuckDB that imports the parquet. The duckdb file from then on will only be opened for read, no further writes. I use a python odata library to convert user queries in rest to a SQL similar to Postgres and run it on these duckdb for applying any filters where needed.