Hacker News new | ask | show | jobs
by dozzie 3004 days ago
> [...] not all settings were applied until you opened the UI provided by the vendor. [...] the NICs would reboot, just long enough to kill TCP connections.

The UI part suggests that it was Windows, and if it was, it's not quite the case that "just long enough" to kill TCP connections, as you need quite a lot of downtime to terminate a typical TCP session.

In Windows, if a NIC goes down, all the TCP connections that use the NIC get closed immediately. (Or at least this was the case a few years ago. I had a similar system with similar drawbacks deployed back then, though it was an automated warehouse, not an assembly plant.)

> So who would you even blame there?

The idiots who designed the system to run on non-industrial-grade operating system. Windows was never a good choice to control industrial installations.

2 comments

Windows is often the only vendor-supported choice for interfacing your computer applications to PLCs and such things. Also most of the proprietary protocols run over industrial ethernet are some kind of legacy serial (232, 485..) bytestream format wrapped in TCP and the software usually does not handle loss of the TCP connection particularly gracefully. (on multiple occasions I've seen rules like "reboot the whole installation on every shift change" to "handle" the obvious reliability issues of such systems)

It is not about some small and well defined set of "idiots", it is essentially industry-wide design mistake.

> Windows is often the only vendor-supported choice for interfacing your computer applications to PLCs and such things.

Which is not a problem by itself, since PLC, being an industrial equipment, should operate independently from a non-industrial equipment. The problem is idiots who think a desktop PC can reliably control PLC in real time.

Problem is when you have some kind of process that is inherently controlled not by the logic in PLC, but by some external system (either because the required data will not fit into PLC's data memory or because they constantly change based on some external bussines processes)

Reasonable architecture for this kind of problem would be attaching some server to the PLC as peripheral, but it tends to be done other way around. As for reasons for that I speculate that it is simply inertia of the typical PLC programmer which is then compounded by reasoning along the lines of nobody does that, so it is not tested and we will hit unknown bugs in the PLC firmware itself.

Is that a reference to Beckhoff?
> In Windows, if a NIC goes down, all the TCP connections that use the NIC get closed immediately.

Yes, that seems more likely.

I think Windows can be a decent platform for light industrial applications - which this system in particular was. The problem is all of the partners and suppliers were either stuck in the past, or had weird ideas.

The parent system was *nix based, but there was a flaw in a communications protocol that lead to the channel bouncing between two boxes, and eventually bringing down the parent system.

My lesson from that was that you can have flaws on any system, no matter how solid the OS.