Hacker News new | ask | show | jobs
by hrzn 1825 days ago
For those interested in time series library, we are developing Darts [1], which focuses on making it easy & straightforward to build and use forecasting models. Out of the box it contains traditional models (such as ARIMA) as well as recent deep learning ones (like N-Beats). It also allows to easily train models on multiple time series (potentially scaling to large datasets), as well as on multivariate series (i.e., series made of multiple dimensions). It will soon support probabilistic forecasts as well.

[1] https://github.com/unit8co/darts/

2 comments

How well does it deal with time series sets that don't fit fully in memory?

Or put another way , how well does it scale horizontally to multiple machines.

We fine most time series libraries to be about the same in terms of features and speed, but very few can handle large datasets well, if at all.

And, of course, thanks for sharing your library, I'll definitely try it out!!

This is supported but only by neural-nets models, which are fit using SGD, hence naturally not requiring the whole dataset in memory. Other models like ARIMA do need the full series loaded in memory.

The models that work on multiple time series in Darts accept Sequence[TimeSeries] for their fit() method. These sequences can either be Lists (fully in memory, simplest option), or when needed it can be a custom Sequence which for example does lazy loading from disk (somewhat similar to what PyTorch Datasets are doing) with the __getitem__() method.

If you need even more control, for instance because you have only one very long series that doesn't fit in memory then you can implement your own Darts "TrainingDataset". In this case you can control how to slice your series exactly.

Edit: I realised this only answers the first sentence of your comment ;) For now there's no mechanism for scaling to multiple machines beyond what PyTorch is already offering. AFAIK it's reasonably easy to scale to multiple GPUs on a machine, but I'm not sure how it would scale on several machines. We never had to try this yet! (Note that actually a single CPU can handle training deep nets models on 10's of thousands of time series similar to the M4 competition in a fairly reasonable time).

is the focus on wrapping existing algorithms (like statsmodels) or are you developing at that level as well?
Both - some models are wrapped (like ARIMA & ETS around statsmodels, Prophet around fbprophet) and we write others ourselves (RNNs, TCNs, N-Beats, ...). Basically we take a pragmatic approach here, we do whatever is best to use a given model in Darts.