|
|
|
|
|
by ingenieroariel
798 days ago
|
|
I went through a similar phase with a process that started with global OSM and Whosonfirst to process a pipeline. Google costs kept going up (7k a month with airflow + bigquery) and I was able to replace it with a one time $7k hardware purchase. We were able to do it since the process was using H3 indices early on and the resulting intermediate datasets all fit on ram. System is a Mac Studio with 128GB + Asahi Linux + mmapped parquet files and DuckDB, it also runs airflow for us and with Nix can be used to accelerate developer builds and run the airflow tasks for the data team. GCP is nice when it is free/cheap but they keep tabs on what you are doing and may surprise you at any point in time with ever higher bills without higher usage. |
|
I would love it if somehow Postgres got duckdb powered columnstore tables.
I know hydra.so is doing columnstores.
DuckDB being able to query parquet files directly is a big win IMO.
I wish we could bulk insert parquet files into stock PG.