Hacker News new | ask | show | jobs
by d4rti 1320 days ago
Stuff I've used:

  - Prophet - seems to be the current 'standard' choice
  - ARIMA - Classical choice
  - Exponential Moving Average - dead simple to implement, works well for stuff that's a time series but not very seasonal
  - Kalman/Statespace model - used by Splunk's predict[1] command (pretty sure I always used LLP5)
I did some anomaly detection work, in business transactions, and found the best way was to create a sort of ensemble model, where we applied all the models, and kept any anomalies, then used simple rules to only alert on 'interesting' anomalies, like:

  - 2-3 anomalies in a row
  - high deviation from expected
  - multiple models all detected anomaly
To improve signal vs noise.

[1] :https://docs.splunk.com/Documentation/Splunk/9.0.1/SearchRef...

2 comments

> - 2-3 anomalies in a row

> - high deviation from expected

> - multiple models all detected anomaly

This is basically what statistical process control charts do for you. If you haven't learned about it already, I can recommend looking it up!

Statistical process control always seemed like some thing that would benefit me in my work, but I don't know anything about it. I have looked up random Wikipedia articles, but that's all I know. Do you know of any more "serious" learning resources in that area?
I think the most succinct intro I have found is Donald Wheeler's Understanding Variation.

I've long wanted to write an open article series for someone like you but never gotten around to it. There's so much information out there, but you sort of have to piece it together on your own, which is suboptimal.

You know what? I finally bit the bullet thanks to your comment: https://news.ycombinator.com/item?id=33507217
I don't know what you've been using Prophet for but I found it to be very brittle.
Yup, as per my other comment (https://news.ycombinator.com/context?id=33448802), fbprophet is largely tuned for a few years of somewhat regular business data sampled daily (e.g. sales per day). Outside its comfort zone (e.g. if you have monthly, or hourly/minutely data, or step changes / level shifts) it can fall apart pretty quickly. But its comfort zone happens to be very popular in business settings.

To be fair, it's pretty hard to create generic models that are can robustly handle any random time series.

Fair enough, when I was working with monthly sales, it was pretty bad.