Hacker News new | ask | show | jobs
by jakebol 3157 days ago
Jake from TileDB, Inc. I think this would be an ideal workload for array data stores (NetCDF a standard in this area uses HDF5 under the hood). You have <N> number of attributes that you want per grid-point over time (and you want to append to the time dimension). If you are ingesting Grib2 files then you can take advantage of compression as well. An array data store like TileDB should offer advantages for fast access, as you can get a pointer directly to the stored array and do not have to access the (serialized) data over a socket, especially if you are only interested in a subarray of the dataset.
1 comments

Hi both, this is exactly something that I’m looking at doing. We’ve got about 10TB of netcdf data coming in everyday and we’re looking for a cost efficient data store to provide fast access to individual grid points. S3 has proven to be too slow.

Any chance I could pick your brains about using either Postgres or TileDB?

Thanks!

Absolutely! Drop us a line at hello@tiledb.io and tell us a little more about the problem you are trying to solve and we can go from there.