|
Part II There is another cute point: The usual
way to use the historical data has its
order ignored. So, should be able to get
some cute theorems about this that permit
assuming much less than that the input
data is independent and identically
distributed (i.i.d.). That is, get to
assume that the historical data has been
permuted randomly, and, I'm guessing that,
from this could get a cute approximation
result. One start would be the work of
Michel Talagrand on "a new look at
independence". Why have I not pursued
this research? Did I mention that I have
20+ years of evidence that anomaly
detection and a dime won't cover a 10 cent
cup of coffee? Then, with the math clear, the data
collected, etc., need to write some code.
I put my feet up and thought of a way.
Soon I learned that part of my work
reinvented k-D trees, that is, for
positive integer k, search trees in
k-dimensions. IIRC, k-D trees are in Robert Sedgewick and Kevin Wayne,
Algorithms, FOURTH EDITION,
Addison-Wesley, New York, 2011. So, k-D trees are a lot like a k
dimensional generalization of the usual
one dimensional binary search. But, there, also need some cutting planes
and a little backtracking in the tree.
For this, for the computer hardware in
serious production, the new several TB
solid state disk (SSD) drives would be
just terrific: Load up one of those with
a k-D tree of historical data, and then
in production deployment many times a
second query that data. Since are using
the data -- write, say, once a week or
month and read hundreds or thousands of
times a second -- the SSD would give
fantastically fast data rates and not wear
out. Of course, more could be done: (1) How much historical data is really
needed? (2) Of the data on several variables,
which are needed? Or, maybe we should
prove some theorems that show when too
many variables without enough historical
data hurt the results? (3) Should we think about scaling the
data on some of the variables? If so,
then how to know what variables and how
much scaling? Could want some useful
theorems here. (4) For a system level view, could we be
hierarchical, that is, say that this one
server is sick, with another detector
drill-down and say that this one virtual
machine on that server is sick, drill
down and say that this one applications
program is sick, or some such? In all
cases? No. In some cases, maybe? (5) After detection with a low false alarm
rate, the next step is diagnosis. But
detectors that in some useful sense
localize the source of the sickness
should be able to help with diagnosis and,
indeed, the third step, correction. So, a
question is, at what scales and where to
deploy detectors? Could use some useful
theorems, analysis, etc. here. (6) Networks and server farms are changing
constantly, but need some good historical
data that still describes a healthy
system. Okay, could use some work to show
when the changes have been enough to need
new historical data. (7) Maybe of high interest to the suits in
the C-suite, do some on decision theory,
that is, essentially cost minimization.
So, have costs for false alarms and costs
for missed detections and try to set the
false alarm rate, and/or the number of
detectors, etc., to minimize the sum of
all these costs. Since a missed detection, i.e., a problem
detected too late instead of ASAP, can be
headlines for IIRC Sony, Target, the NYSE,
Yahoo, etc., the suits might be eager for
relatively a lot of such monitoring. (8) My work was for problems never seen
before. The 50,000 foot view of that is
that, once have seen a problem, detected
it, diagnosed it, and corrected it, then,
with the corrections, really shouldn't see
the problem again. So, really what we
should be looking for are problems never
seen before and should slap our own wrists
for any problem we keep seeing over and
over. Given some e-mail addresses, I'll send a
reference to my published paper. But I'm
trying to be anonymous here at HN. Again, I'm no longer motivated by money to
pursue anomaly detection: For 20+ years I
got overwhelming evidence that work in
anomaly detection and a dime won't cover a
10 cent cup of coffee. I tried to make a
startup out of this work, but gotta tell
you, after hundreds of e-mail messages to
venture capital firms, about 98% said
nothing back, one gave a weak reply with
no indication of any real understanding of
the market, technical challenges, my work,
what such a company would look like, etc.,
and the remaining 2%, or maybe 1%, gave
only some boiler plate about "not in our
focus area". So, I have another startup in progress,
easier to do, easier to sell, much more
promising and close to going live and
making money. So, here I'm willing to do an anomaly
detection technology give away! Get your
free anomaly detection tutorial and
technology! Uh, when I did the research at IBM's
Watson lab, the Watson lab patent office
was getting excited by my work -- "there
are several projects in the lab attacking
anomaly detection, and your work is the
only one taking your approach.". I had my research in a nicely written
paper, complete with theorems and proofs
and some good tests on some real data
(from a computer cluster at Allstate --
the cluster was part of the motivation to
be multi-dimensional, e.g., to detect some
of the failures due to chain reactions
in clustering) and also some hypothetical
data that would make a severe test (the
checkerboard example is a simplification),
and a guy at IBM claimed to have my work
"reviewed" and pronounced the work as "not
publishable," and I got fired, walked out
the door. Not good. He didn't give me a
chance to submit my work for publication. But, out of IBM, with a copy of my paper
and a letter from IBM permitting me to
publish, I submitted the paper, and it was
accepted without revision (except
something about indenting the first
paragraph after a subheading!) by the
first journal where I submitted,
Information Sciences, a good enough
journal. I've been burned enough pursuing anomaly
detection. I don't like being burned;
it's no fun. My conclusion is that any very good work
in the field is just way too darned
difficult for essentially everyone else
that would be involved -- management
chain, journal reviewers, computer science
departments, venture capital partners,
suits at target customers, etc. Right, as suggested elsewhere in this
thread, try to sell a solution or a
service, maybe cloud based. But setting
that up and getting it to good traction
would take some cash, more than I'm going
to put forward. |