Hacker News new | ask | show | jobs
by graycat 4788 days ago
Expert systems for monitoring and management of large server farms and networks? Been there. Done that. Got the T-shirt. Wrote papers. Gave one at an AAAI IAAI conference at Stanford. Shipped two commercial products.

First problem: Need some 'experts'. So, only know about problems the experts do. Their knowledge is supposed to be mostly just empirical, from their 'expert experience'. Even if they have 'deep knowledge', that is, how the systems work internally (e.g., the engine connects to the torque converter connects to the transmission connects to the rear differential connects to the rear wheels), the specific problems they know about are usually just the ones they have encountered. So, essentially can only address problems seen before and are well understood.

But for the problems of the OP, they were seen for the first time. Bummer. Actually, with irony, if the problems have been seen before and are well understood, then why the heck have they not already been solved?

Second, with expert systems, we're working just intuitively from just experience. So, we have little idea if what we are doing is the best possible in any reasonable sense or even at all good. In particular, in monitoring, the main, first operational problem on the 'bridge' or in the 'NOC' (network operations center) is the false alarm rate too high with no good way to lower it (except just to ignore some alarms, but we should be able to do much better).

Net, for a serious attack on system monitoring and management, I can't recommend taking expert systems very seriously.

1 comments

Would love to get an email (address in profile) from you to chat about some tangental stuff (re: intelligence in monitoring systems)