|
Expert systems for monitoring and management
of large server farms and networks? Been there.
Done that. Got the T-shirt. Wrote papers.
Gave one at an AAAI IAAI conference at Stanford.
Shipped two commercial products. First problem: Need some 'experts'. So, only
know about problems the experts do. Their
knowledge is supposed to be mostly just empirical,
from their 'expert experience'. Even if they
have 'deep knowledge', that is, how the systems
work internally (e.g., the engine connects to the
torque converter connects to the transmission
connects to the rear differential connects to
the rear wheels), the specific problems they
know about are usually just the ones they have
encountered. So, essentially can only address
problems seen before and are well understood. But for the problems of the OP, they were seen
for the first time. Bummer. Actually, with
irony, if the problems have been seen before and
are well understood, then why the heck have they
not already been solved? Second, with expert systems, we're working just
intuitively from just experience. So,
we have little idea if what we are doing is the
best possible in any reasonable sense or even
at all good. In particular, in monitoring, the
main, first operational problem on the 'bridge'
or in the 'NOC' (network operations center)
is the false alarm rate too high with no good
way to lower it (except just to ignore some
alarms, but we should be able to do much better). Net, for a serious attack on system monitoring
and management, I can't recommend taking
expert systems very seriously. |