Hacker News new | ask | show | jobs
by graycat 2902 days ago
So, for anomaly detection, before evaluating the model at x, might want to know if x would be an anomaly in the training data x_i, i = 1. 2, ..., n. Sure, x is likely a vector with several to many components.

An anomaly detector should be at least as good as a statistical hypothesis test.

So, for the null hypothesis, assume that x is distributed like the training data.

Okay, except we don't really know the distribution of the training data.

"Ma! Help! What am I supposed to do now???"

So, we need a statistical hypothesis test that is both multi-dimensional and distribution-free.

Let's, see: In ergodic theory we consider transformations that are measure preserving .... Yup, can have a group (as in abstract algebra) of those, sum over the group, ..., and calculate the significance level of the test and, thus, get a real hypothesis test, multi-dimensional and distribution free. For some of the details of the test, there are lots of variations, i.e., options, knobs to turn.

Detection rate? Hmm. Depends ...! Don't have data enough to use the Neyman-Person approach, but in a curious but still relevant sense the detection rate is the highest possible.

I just call this work statistics, but maybe it would also qualify according to some definitions as machine learning. But my work is not merely heuristic and has nothing to do with regression analysis or neural networks. So, again, my work is an example that there can be more to machine learning than empirical curve fitting.

So, before applying an empirically fitted model at x, want x to be distributed like the training data and at least want an hypothesis test not to reject the null hypothesis that x is so distributed.

More generally, if are looking for anomalies in the data, say, a rapid real time stream, when see an anomaly, investigate further. In this case, an anomaly detector is a first cut filter, an alarm, to justify further investigation.

Looking back on what I did, I suspect that more could be done and that some of what I did could be done better.

Of course, my interests now are my startup. Yes, there the crucial core is some applied math I derived.

Maybe I'll use my anomaly detection work for real-time monitoring for zero-day problems in security, performance, failures, etc. in my server farm.

1 comments

As a very general but crude and blunt approach to show that the hypotheses tests were not trivial, used the result of S. Ulam that Le Cam called "tightness" as in P. Billingsley, Convergence of Probability Measures. When doing both multi-dimensional and distrbution-free, are nearly way out in the ozone so get pushed into some abstract techniques! Meow!