Hacker News new | ask | show | jobs
by longdog 812 days ago
Interesting, but I'm very skeptical. There are over a dozen transformers-based foundation time series model released in the past year and without fail, every one of them claims to be at or near SOTA. For example:

- Time-LLM (https://arxiv.org/abs/2310.01728)

- Lag-Llama (https://arxiv.org/abs/2310.08278)

- UniTime (https://arxiv.org/abs/2310.09751)

- TEMPO (https://arxiv.org/abs/2310.04948)

- TimeGPT (https://arxiv.org/abs/2310.03589)

- TimesFM (https://arxiv.org/html/2310.10688v2)

- GPT4TS (https://arxiv.org/pdf/2308.08469.pdf)

Yet not a SINGLE transformer-based model I've managed to successfully run has beaten gradient boosted tree models on my use case (economic forecasting). To be honest I believe these foundational models are all vastly overfit. There's basically only 2 benchmarking sets that are ever used in time series (the Monash set and the M-competition set), so it'd be easy to overtune a model just to perform well on these.

I would love to see someone make a broader set of varied benchmarks and have an independent third party do these evaluations like with LLM leaderboards. Otherwise I assume all published benchmarks are 100% meaningless and gamed.

5 comments

Why would you expect anything to work well for economic forecasting :p
Jamie pull up the article that proves none of the published models work well with economic forecasting
There is always Gary Stevensons Economics model. Works without fail.
I'm so sad. This hilarious comment is languishing in the doldrums.
Not reddit.
Pretty much any real-world time series prediction task is going to involve more data than just the time series itself, and some of this data will probably be tabular, so it's not surprise gradient boosted trees perform better.
Neural nets are known to struggle with tabular data. Have you tried fine tuning or attaching a decoder somewhere that you train on your task? Zero-shot inference might be asking for too much.
>> Neural nets are known to struggle with tabular data.

Not disagreeing with you, and I'm not a specialist, but it's funny that lot of papers seem to claim exactly the opposite.

What paper says the opposite? This is what I can find:

https://arxiv.org/abs/2207.08815

https://arxiv.org/abs/2305.02997

Honestly the best part of this paper is they've put together a large new set of time series for benchmarking.
https://facebook.github.io/prophet/

"Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well."

?