Hacker News new | ask | show | jobs
RAD – Outlier Detection on Big Data (techblog.netflix.com)
44 points by trickz 4136 days ago
2 comments

Consider:

"A Real-Time System-Adapted Anomaly Detector", 'Information Sciences', volume 115, April, 1999, pages 221-259.

It's a distribution-free statistical hypothesis test for multidimensional data. False alarm rate can be adjusted in small steps over a wide range and, then, is achieved exactly.

It has nothing to do with any Gaussian distribution (is distribution-free), principal components analysis, singular value decomposition, etc.

But it doesn't come with an open source implementation.
How to implement it efficiently is an issue!

If you can figure it out, publish it!

Else when I get time, I will!

There's also Symbolic Aggregate ApproXimation (SAX) techniques which are very fast and support indexing algorithms.