|
|
|
|
|
by davidsrosenberg
2903 days ago
|
|
So the extrapolation-type problem you describe (an input not near any of your training examples) is an issue. Unless you have a world model you believe in (i.e. you've done some science -- not just statistics), hard to know if your prediction function works out there where you’ve never seen any examples. If you’ve seen some data out there, but relatively fewer than you see in deployment, then importance weighting or other approaches from covariate shift / domain adaptation could help. Anomaly detection is definitely another important area, but I struggle to pull together a coherent unit on the topic. One issue is that it’s difficult to define precisely, at least partly because everybody means something a little bit different by it. Also, based on classical hypothesis testing, I think that to some extent you have to know what you’re looking for to be able to detect it (ie to have power against the alternatives/anomalies you care about)... For that reason, I think it’s hard to separate anomaly detection from more general risk analysis/assessment, because you need to know the type of thing you care about finding. In any case, I made an attempt on anomaly detection: There's https://bloomberg.github.io/foml/#lecture-15-citysense-proba... which is simply about building a conditional probability model, and flagging behavior as anomalous if it has low probability or prob density under the model. I also used to have 1-class SVM’s in a homework (https://davidrosenberg.github.io/mlcourse/Archive/2017/Homew... Problem 11). |
|
An anomaly detector should be at least as good as a statistical hypothesis test.
So, for the null hypothesis, assume that x is distributed like the training data.
Okay, except we don't really know the distribution of the training data.
"Ma! Help! What am I supposed to do now???"
So, we need a statistical hypothesis test that is both multi-dimensional and distribution-free.
Let's, see: In ergodic theory we consider transformations that are measure preserving .... Yup, can have a group (as in abstract algebra) of those, sum over the group, ..., and calculate the significance level of the test and, thus, get a real hypothesis test, multi-dimensional and distribution free. For some of the details of the test, there are lots of variations, i.e., options, knobs to turn.
Detection rate? Hmm. Depends ...! Don't have data enough to use the Neyman-Person approach, but in a curious but still relevant sense the detection rate is the highest possible.
I just call this work statistics, but maybe it would also qualify according to some definitions as machine learning. But my work is not merely heuristic and has nothing to do with regression analysis or neural networks. So, again, my work is an example that there can be more to machine learning than empirical curve fitting.
So, before applying an empirically fitted model at x, want x to be distributed like the training data and at least want an hypothesis test not to reject the null hypothesis that x is so distributed.
More generally, if are looking for anomalies in the data, say, a rapid real time stream, when see an anomaly, investigate further. In this case, an anomaly detector is a first cut filter, an alarm, to justify further investigation.
Looking back on what I did, I suspect that more could be done and that some of what I did could be done better.
Of course, my interests now are my startup. Yes, there the crucial core is some applied math I derived.
Maybe I'll use my anomaly detection work for real-time monitoring for zero-day problems in security, performance, failures, etc. in my server farm.