Hacker News new | ask | show | jobs
by rabernat 1286 days ago
Agree 100%. This is big part of the motivation behind our new startup Earthmover: https://earthmover.io/

Our mission is to make it easier to work with scientific data at scale in the cloud, focusing mainly on the climate, weather, and geospatial vertical.

My cofounder Joe Hamman and I are climate scientists who helped create the Pangeo project. We are also core devs on the Python packages Xarray and Zarr. We think that a layer of managed services (think a "modern data stack" oriented around the multidimensional array data model) is exactly what this ecosystem needs to make it easier for teams to build data-intensive products in the climate-tech space.

And we're hiring! https://earthmover.io/posts/earthmover-is-hiring/

2 comments

I'm curious, have you considered using PyTorch or JAX for tensor processing? ML libraries seem to be much further along when it comes to performing compute-intensive, hardware-accelerated operations on Tensor's. And you get gradients basically for free (in terms of developer time). Also, the kernel compiler being added PyTorch 2 looks very promising.
The primary issue in this domain is not compute - it's I/O, especially when you need to perform complex computations with intermediate data that doesn't fit into memory.
PyTorch and JAX are used heavily in climate science on the ML side. For more general analytics, not so much. Many of our users like to use Xarray as a high-level API. There has been some work to integrate Xarray with PyTorch (https://github.com/pydata/xarray/issues/3232) but we're not there yet.

The Python Array API standard should help align these different back-ends: https://data-apis.org/array-api/latest/

Elixir?