| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Fiahil 1738 days ago

Thanks for the clarification ! :)

Not an advice, but you should probably consider spinning a secondary product from DuckDB with a sole focus on "reading data from parquet files and running aggregations the most efficiently possible". You can probably skip INSERT, UPDATE, DELETE completely.

There is currently a gap in practical solutions for this pain point. You can use Spark or Airflow, but nothing that comes without a big infra price tag (you can do that with pandas, but you need a large instance to load the entire dataset in memory). I think the right product could even outpace what you currently have with DuckDB.