Hacker News new | ask | show | jobs
by refactor_master 1828 days ago
What are some ways to deal with large volumes of variable-length timeseries for real-time predictions? The best solutions I've tried myself all hinge on windowed-feature extraction or LSTMs. It generally works, but starts to fall apart when you're squeezed for data.

It seems that almost everywhere you look, every example has just one timeseries that needs to be dealt with. However, since the methods are much more "statistical" in nature, they can actually make meaningful predictions on a single sample.

3 comments

I would say manual feature extraction? Your custom extraction could reduce the variable lengths to a uniform dimension (same number of features for every input), which can then be used by almost any algorithm.

These automatic extractions are very statistical in nature indeed, but for some datasets domain insights are more valuable and give more usable features (in my opinion). I found quite some datasets where manual features + gradient boosted trees give better results then automated statistical methods. Often combinations give better results :)

For training forecasting models on multiple time series (and potentially large datasets), you can take a look at Darts [1] and the blog post [2].

[1]: https://github.com/unit8co/darts/

[2]: https://medium.com/unit8-machine-learning-publication/traini...

Maybe lookup panel data and repeated experiments. Those techniques are applied when the data is "tabular"; there are often relatively few observations on any individual time axis, but there are many instances of these experiments. It's a branch of linear forecasting (least squares), but it's tailored for example for biological experiments where you have several sets of results - related but maybe not performed in the same lab - which you want to amalgamate.