Hacker News new | ask | show | jobs
by tzs 4006 days ago
That warship was NOT disabled due to NT issues. It would have been exactly as disabled if they had used Linux, or Solaris, or OS/2, or any other modern operating system.

They were using a client/server architecture, where the clients were essentially smart terminals for data entry and display. The failure happened when someone entered a 0 in a field that was not supposed to ever be 0. The terminals did not error check that field and reject bad values, and the server did not error check its input (probably it was written under the assumption that the terminals did the validation). The result was that their server application divided by 0.

The application did not trap divide by zero exceptions, and so NT did exactly the same thing nearly every other modern OS, included nearly all Unix and Unix-like operating systems, does when an application does not trap this kind of exception: it terminated that process.

The application developers had not made provisions to automatically restart the application if it failed, and the terminals couldn't do anything with the server application down, and so the ship was dead.

1 comments

This article is incredibly biased against Windows.

> when the software attempted to divide by zero, a buffer overrun occurred

While it's possible some poor exception handling lead to a buffer overrun, it sounds dubious. Your explanation sounds more likely - do you have any references?

The various random quotes regarding Windows NT's fit for purpose are highly opinionated. The article doesn't mention that at the time Windows NT was certified at the NCSC's C2 rating level; while I'm just guessing, it seems entirely reasonable to select Windows NT because it was the only C2 certified OS with a GUI, which may have simplified development and systems integration given that some of the applications required user input.

The grandparent comment mentions it was client/server, which probably means a network. AFAIK, Windows NT was certified as C2 only without a network, see for instance http://csc.columbusstate.edu/summers/NOTES/CS459/NT-C2.htm ("Windows NT's C2 certification was conducted on a stand-alone computer. Hence the computer needs to be disconnected from the network by uninstalling all network hardware and software on the system.")
I don't have any specific reference that covers all of it. There were a lot of different stories reported, so it was more of a synthesis of all of them, with some filtering taking into account what I could guess based on titles of people quoted and how they phrased things if they were technical people or management. There are also a lot of details differing in the various articles. For instance, someone said the ship was towed to port and took a couple days to fix. Later reports said it was simply stopped for a couple hours while they fixed it at sea. Then there was a dispute between the story that reported the towing claim and the person they quoted for that, with the later saying he was misquoted and never claimed it was towed, and the magazine insisting they accurately quoted him.

Here's a typical early article: http://gcn.com/Articles/1998/07/13/Software-glitches-leave-N...

It's possible that I've misdiagnosed that the exception was not caught. It is also consistent overall with the reports that it was caught, and so rather than being terminated by the OS the process ignored the divide by 0 and so ended up using some invalid result, leading to the application failing.

NCSC C2 is absolutely meaningless. You can be sure that NT was positively riddled with significant coding errors. I agree with 'tzs, of course.