|
|
|
|
|
by numlocked
3814 days ago
|
|
These are some cool SQL tricks! I like it. The big caveat with the standard deviation technique is that it assumes a normal distribution. Many datasets are not actually distributed normally (power-law, Poisson, beta, etc, etc) and so the technique won't work. It's a much harder problem to 'generically' detect outliers without knowledge of the underlying distribution. I don't have any idea how to do it (though a former colleague came up with nice idea of building a histogram and searching for values that occurred after some number of empty bins, implying an outlier). Is there an accepted state-of-the-art for general-purpose outlier detection? Or is that such a broad question as to be meaningless? |
|
Here's an example - expected variance for the number of SWIFT payments processed during non-banking hours is 0. Transaction counter greater than 0 is an outlier.