Hacker News new | ask | show | jobs
by generatorguy 2293 days ago
thanks for the links.

We only create an alert if there is a problem the operator can solve, otherwise there is no point in waking them up at 3 AM, so if anything our thresholds are set as loose as possible instead of as tight as possible.

However there are many instances where the operator could be alerted earlier that the machine operation is abnormal. For example the stator windings are rated for operation up to 155 degrees C but the machine is lightly loaded for a long time, the ambient temperature is normal, and the windings are 140 degrees. No alert would be generated from the stator winding temperature but something is amiss.

I think this is the case where some ML/AI/hypeword techniques might be applicable, for the controller to know that based on half a dozen variables the expected value for other variables based on past operation.

2 comments

You should take a look at http://riemann.io
I agree with focusing on actionable alerts during on-call hours. You might be able to have some kind of scheduled change in sensitivity.

One thing I've wondered in the past year is whether fuzzy logic would be useful. Your example is a really good case of linguistic variables -- "lightly loaded", "a long time", "normal temperature" and so on. These can be assembled into rules or tables that should fire more sensibly than exact threshold values.