| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by d4rti 1320 days ago

Stuff I've used:

  - Prophet - seems to be the current 'standard' choice
  - ARIMA - Classical choice
  - Exponential Moving Average - dead simple to implement, works well for stuff that's a time series but not very seasonal
  - Kalman/Statespace model - used by Splunk's predict[1] command (pretty sure I always used LLP5)

I did some anomaly detection work, in business transactions, and found the best way was to create a sort of ensemble model, where we applied all the models, and kept any anomalies, then used simple rules to only alert on 'interesting' anomalies, like:

  - 2-3 anomalies in a row
  - high deviation from expected
  - multiple models all detected anomaly

To improve signal vs noise.

[1] :https://docs.splunk.com/Documentation/Splunk/9.0.1/SearchRef...

2 comments

kqr 1320 days ago

> - 2-3 anomalies in a row

> - high deviation from expected

> - multiple models all detected anomaly

This is basically what statistical process control charts do for you. If you haven't learned about it already, I can recommend looking it up!

link

nerdponx 1320 days ago

Statistical process control always seemed like some thing that would benefit me in my work, but I don't know anything about it. I have looked up random Wikipedia articles, but that's all I know. Do you know of any more "serious" learning resources in that area?

link

kqr 1320 days ago

I think the most succinct intro I have found is Donald Wheeler's Understanding Variation.

I've long wanted to write an open article series for someone like you but never gotten around to it. There's so much information out there, but you sort of have to piece it together on your own, which is suboptimal.

link

kqr 1316 days ago

You know what? I finally bit the bullet thanks to your comment: https://news.ycombinator.com/item?id=33507217

link

NeutralForest 1320 days ago

I don't know what you've been using Prophet for but I found it to be very brittle.

link

em500 1320 days ago

Yup, as per my other comment (https://news.ycombinator.com/context?id=33448802), fbprophet is largely tuned for a few years of somewhat regular business data sampled daily (e.g. sales per day). Outside its comfort zone (e.g. if you have monthly, or hourly/minutely data, or step changes / level shifts) it can fall apart pretty quickly. But its comfort zone happens to be very popular in business settings.

To be fair, it's pretty hard to create generic models that are can robustly handle any random time series.

link

NeutralForest 1320 days ago

Fair enough, when I was working with monthly sales, it was pretty bad.

link