Hacker News new | ask | show | jobs
by generatorguy 2295 days ago
I work on power stations which normally have about 1000 monitored variables per turbine-generator and another 500 for the plant in general. So typically 2500 for a two unit plant.

Alarms are generated if a variable exceeds a threshold, or a binary variable is in the wrong state.

Is Orbiter something that would benefit power plants?

5 comments

Hey generatorguy - this is a really interesting use case so thanks for sharing. I imagine our modeling / monitoring / alerting capabilities can extend to power plants but will need to understand the data better. The common types of business and product metrics that our customers look for include user growth, cancellation rates, call failure %s, all of the above by different geos, etc. Happy to chat more if you'd like to shoot me an email (I'm winston[at]getorbiter.com)
I think some sort of anomaly detection would be useful in your case. There are a bunch of libraries floating about, I remember at least Netflix[1], Yelp and Datadog talking about them. There appears to be a really good links page available too[1]. You can also learn a lot from Forecasting Principles and Practice, which is free online[2]

I have previously pitched using a kind of SPC-for-metrics approach, with Nelson rules[3] to help surface metrics which are starting to move out of control. I think it would have the advantage over ML techniques that it's easy to understand.

My experience is that alerting thresholds are a very poor mechanism for managing systems. They just ossify past disasters and typically become noise. Alert fatigue renders them meaningless. If they're set by the manufacturer then the incentives are broken, they will favour false alerts in order to push legal responsibility onto the operator.

[0] https://github.com/Netflix/Surus

[1] https://github.com/yzhao062/anomaly-detection-resources

[2] https://otexts.com/fpp2/

[3] https://en.wikipedia.org/wiki/Nelson_rules

thanks for the links.

We only create an alert if there is a problem the operator can solve, otherwise there is no point in waking them up at 3 AM, so if anything our thresholds are set as loose as possible instead of as tight as possible.

However there are many instances where the operator could be alerted earlier that the machine operation is abnormal. For example the stator windings are rated for operation up to 155 degrees C but the machine is lightly loaded for a long time, the ambient temperature is normal, and the windings are 140 degrees. No alert would be generated from the stator winding temperature but something is amiss.

I think this is the case where some ML/AI/hypeword techniques might be applicable, for the controller to know that based on half a dozen variables the expected value for other variables based on past operation.

You should take a look at http://riemann.io
I agree with focusing on actionable alerts during on-call hours. You might be able to have some kind of scheduled change in sensitivity.

One thing I've wondered in the past year is whether fuzzy logic would be useful. Your example is a really good case of linguistic variables -- "lightly loaded", "a long time", "normal temperature" and so on. These can be assembled into rules or tables that should fire more sensibly than exact threshold values.

Not OP, but I researched scalable anomaly detection systems for power-generating assets. We collaborated with a large industrial engine manufacturer on this work. https://arxiv.org/abs/1701.07500. The key challenge customers encountered was the prevalence of false alarms that led to unnecessary service.
Woah this is awesome. How did you guys resolve the false alarm issue wrt power plants?
There is a small company in Lund, sweden that specialized in this. Its run by a former professor of mine in uni. The basic idea is to build a model of the system and connect detectors output to it, and it will use that info to detect anomalies and filter errors to find root cause. https://www.goalart.com/ not affiliated in any way, except in already stated.
Out of curiosity, since I'm interested in industrial monitoring: would you mind telling a bit more about the monitoring infrastructure, esp. how often are those metrics collected and what data protocols are involved in the process?
I only know from my own experience and I’m essentially self taught, so I don’t know what industry norms are only what has worked for me and my customers.

The instruments and controlled devices are wired to a PLC such as Allen Bradley control logix or Schneider electric m580. The PLC generally reads the inputs, executes the program, and updates the outputs every 10ms. HMI software running on a computer such as inductive automation ignition, vtscada, wonderware, citect, etc reads data from the PLC to display to the operator and record for history. Protocols are often modbus or common industrial protocol (CIP) which is also called, or some flavor of it, the ridiculous name of Ethernet/IP, but that’s the kind of shit you get in industrial automation.

I generally set the HMI software to record my 2500 values once per second.

During testing it is common to use a data acquisition system that can sample even much faster than the PLC runs, eg 1 kHz.