Hacker News new | ask | show | jobs
by nickpsecurity 3921 days ago
You call up IBM. You ask for a mainframe solution for two sites. You get experts to set it up for you with your application and such. You don't worry about downtime again for at least 30 years.

You call up Bull, Fujitsu, or Unisys for the same thing.

You call up HP. You ask for a NonStop solution. You get same thing for at least 20 years.

You call up VMS Software. You ask for an OpenVMS cluster. You get same thing for at least 17 years.

Well-designed OS's, software, and hardware did cloud-style stuff for a long time before cloud existed without the downtime. Cloud certainly brought price down and flexibility up. Yet, these clouds haven't matched 70-80's technology in uptime yet despite all the brains and money thrown at them. That's a fact.

So, shouldn't be used for anything mission critical where downtime costs lots of money.

3 comments

This is absolute cobblers. I worked in an IT team that had a pair of IBM mainframes that were fed and watered at crushing expense and for which even the tiniest software change required a colossal waterfall project.

One day, they failed. One went offline - for reasons never revealed, at least to me - and the secondary didn't come up. Radio silence, kaput. But an airline that housed mainframes in the same DCs had their booking system fail at exactly the same times (with national headlines to match).

The myth of mainframe uptime is exactly that. La-la-land for hardware & services salesmen.

Appreciate the counterpoint. Could in fact be a myth or legend. Lots of money to justify spreading disinformation, too. Maybe an anonymous survey by a reputable organization is in order that tries to break down what issues people have and don't have along with specific metrics. Then compile that into a big picture.

Meanwhile, the companies I've worked at all had mainframes without trouble from them that people said. Problems were virtually always the app developers or the pain of doing 21st century stuff with 60's-80's architecture or legacy code.

I've seen NonStop solution failing due to completely mundane reason of insufficient disk space after a burst of transactions. One condition for those 30-year uptimes is also a 100% predictable environment.
Never heard of that one. Funny stuff. Shows even top tier can be improved.

Edit to add: mainframes also run user and server type workloads. Some of those are predictable, some aren't. Bull's virtualize whole desktops. The mainframe as a whole, esp important services, are usually still available despite issues with those. For instance, my company splits stuff between critical on mainframe or AS/400's plus non-critical on whatever is useful ("best-of-breed" they say...). The critical stuff is either on the IBM stuff or leverages it in client-server setup. Those apps either always work or (rarely) they fail-safe in an obvious way that does no damage. Nobody I work with can remember those systems going down over 10 years they worked there. The other stuff regularly has issues across the board. The key difference is effective architecture and how it's implemented.

Caveat here. Sure, a mainframe/high end hardware & supervisor OS can run for 30 years... but the actual applications that users are facing... no, they cannot. You need to upgrade DB2 or IMS or whatever Java app you are using? There will be downtime.
Depends on the design. You have to plan for that stuff ahead of time. I'm not going to claim that's easy. It's just very helpful and there's companies that specialize in helping with it. Most common method was decomposing the app while running it on a cluster so parts of the app or nodes can be taken down. There's strategies for mainframes, too, but my experience was clusters.

Basic strategy was putting something in front of them that can redirect to the new system upon a trigger. Let's assume its functionality + tons of data. The new system first gets the data moved to it in batch form for efficiency reasons. Once it catches up enough, it starts syncing in a more online fashion until it gets to point that it's syncing in real-time. All kinds of tests are performed on that system throughout this process. Eventually, a change-over happens that should be barely perceptible. The inability to do this is usually due to fragile architecture or tightly-coupled implementations which are unfortunately all too common in enterprises.

Note: It can also help if your app was written in something like Common LISP or Erlang that supports live updates. That with the delta approach (version A->A/B->B) equals upgrades with no downtime. ;) Combining it with clustering approach is quite powerful but clustering approach is more applicable to tools majority uses.