Hacker News new | ask | show | jobs
by roter 795 days ago
I too use the ERA5 reanalysis data and I too need quick time series. As the data comes in [lat, lon] grids, stacked by whatever period you've chosen, e.g. [month of hourly data, lat, lon], it becomes a massive matrix transpose problem if you want 20+ years.

What I do is download each netCDF file, transpose, and insert into a massive 3D HDF file organized as [lat, lon, hour]. On my workstation it takes about 30 minutes to create one year for one variable (no parallel I/O or processes) but then takes milliseconds to pull a single (lat, lon) location. Initial pain for long-term gain. Simplistic, but I'm just a climatologist not a database guru.

1 comments

Haha simplistic but probably faster and more space-efficient than a relational database. Sounds like rabernat and open-meteo who commented here do something similar to you and find it fast as well!