| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by PolarizedPoutin 837 days ago

Hey Ryan and thank you for the feedback!

I agree that storing the data is appropriately chunked Zarr files is almost surely going to be faster, simpler to set up, and take up less space. Could even put up an API in front of it to get "queries".

I also agree that I haven't motivated the RDBMS approach much. This is mainly because I took this approach with Postgres + Timescale since I wanted to learn to work with them, and playing around with ERA5 data seemed like the most fun way. Maybe it's the allure of weather data being big enough to pose a challenge here.

I don't have anything to back this up but I wonder if the RDBMS approach, with properly tuned and indexed TimescaleDB + PostGIS (non-trivial to set up), can speed up complex spatio-temporal queries, e.g. computing the 99th percentile of summer temperatures in Chile from 1940-1980, in case many different Zarr chunks have to be read to find this data. I like the idea of setting up different tables to cache these kinds of statistics, but it's not that hard to do with Zarr either.

I'm benchmarking queries and indexes next so I might know more then!