Hacker News new | ask | show | jobs
by linarism 3136 days ago
Serious question (not in IT), what else can they run?
10 comments

They could likely run any of the BSD variants. Some form of commercial RTOS might also be viable. Even Minix might be an option (hire away some of those Intel folks!) though I'm not sure of its hardware support.

That said, why bother? Linux is customizable enough to strip out almost everything extraneous for nodes and has the largest pool of experienced developers, and it's unlikely that any of the other options would have any performance advantage - and less likely that they could keep any performance advantage once it was identified.

Why on earth would you want a RTOS for HPC?

RT is all about privileging latency over throughput, whereas in HPC throughput is everything.

Perhaps not a RTOS per se, but latency is an issue for some HPC applications. Or not latency per se, but unpredictable latency among different nodes. The basic problem is that many large-scale simulations do frequent global barriers. So if one of the nodes is a bit slower because, say, the OS decided to schedule out the application and run ntpd for a fraction of a millisecond before switching back to the application, all the other nodes will wait. The more nodes, the higher probability that something like this happens, and application scalability takes a hit.

The term you want to search for is "OS jitter", lots of papers on that topic.

Perhaps because a lot of tasks are not embarassingly parallel, which means moving data around, which means network interconnects, which means latency matters. Every millisecond delay in forwarding a packet is thousands of instructions stalled.

I'm not saying this is actually a sensible tradeoff for any extant HPC architecture, but it could be. You certainly want the 'OS' that runs the pipeline on your CPU to be real time!

It is a misconception that RT means "lowest possible latency".

It actually means "latency with a known upper bound". The point of it is to be able to give engineering guarantees - the actuators on the rocket engine vanes will move within N milliseconds of the input voltage from the gyroscope module changing; the robot arm motor will be de-energised within N milliseconds of the laser perimeter sensor activating.

Bounding the worst case can often mean making trade-offs in the best case, and that doesn't make sense in HPC because the hard upper bound just isn't necessary in this environment.

True, a lot of tasks are not embarrassingly parallel, as you point out, but most tasks that run on supercomputers are embarassingly parallel. That's why those supercomputers have value in that problem domain in the first place.
It's unlikely that you would, but once you make it through Linux, the BSDs (including macOS), Unix systems (legacy?), Windows and Minix there's just not that much else left that's modern. At least with a RTOS or something designed for embedded systems you have some hope that it's designed to be lightweight.
IIRC at one point Microsoft paid one of them to run Windows and years ago there was one running OS X. Decades ago during the Unix Wars there used to be proprietary HPC OSes.
When Cray and SGI were still in the game there was a lot more operating system diversity.

Apple's Xserve cluster was an interesting experiment but it never really went anywhere.

> Apple's Xserve cluster was an interesting experiment but it never really went anywhere.

A pity Apple does not manufacture real servers any more. With the old cheesegrater-mac-pros, which were the last thing available after the Xserve death, you at least have the option to put in 4 disks and do a RAID setup, but no remote power/console management or a serial port...

They've acknowledged the "trash can" design was a mis-step and they're working to correct it, so there's that.

Competing in the server market was very difficult for Apple, their needs are often radically different from consumer or even "pro"-consumer. They realized the couldn't "win" at servers, so backing out was the best bet for them.

> They realized the couldn't "win" at servers, so backing out was the best bet for them.

The need for servers is there, or otherwise the "macOS Server" would not be in the App Store. It actually is really nice, it provides everything needed for a business: LDAP, file server, mail, calendar, webserver, TimeMachine centralized, print server, ... - but how am I supposed to recommend this to a customer when Apple does not have any kind of HA ensurance (dual power supply, remote management, auto power on after AC loss, proper SNMP support, Wake On LAN), a way to attach more than 4 disks (or a real RAID, not a software one like CoreStorage does, and they cut out RAID management of diskutil anyway) or actually hardware that I can mount in a rack without wasting immense amounts of (expensive) space?

For the small-business scale, Microsoft actually has the SBS Server licenses, which are not tied to any specific vendor and are exactly the same tools used at big companies so a company can hire any competent consultant/admin for management.

Hell, they could simply ask Dell, HP or whomever to sell a specific server as "macOS Server compatible", charge a bit of money for the OS and that's it - people would rush to buy it, and if only to run a CI/CD environment for mobile app building without having to run it on non-DC-grade hardware!

There's still a market, but Apple's not interested in being a bit player in it. HP, Dell, and SuperMicro utterly own the server space. Taking those companies head-on with a single one-size-fits-all server is never going to work, and Apple wasn't prepared to build out a complete server line.

The "Server" software they have is for small businesses and is pretty good for 10-20 users, but beyond that you'd probably use Google Apps, Microsoft Exchange or something more serious. It's a nice thing to have, but it can't compete at an enterprise level.

Both Cray and SGI (now HPE) are both still very much in the game. Their systems all run Linux.
I mean with their respective UNIX-like operating systems.

There was a time when you'd find server rooms with a mix of AIX, HPUX, IRIX, SunOS, VMS, A/UX, and Netware and that was considered normal. In an academic environment you'd have someone running Plan9 or BSD some other quirky thing as well just because.

Then Windows and Linux came along and started to kill off this diversity one system at a time.

If you go here:

https://www.top500.org/statistics/overtime/

Then select "Operating System" and submit, you can see the OSs that used to be run on systems over the past 10 years.

Nice chart! The speed with which Linux supplanted proprietary Unix systems is remarkable.

It would be interesting to know what version of Linux is used most often on current supercomputers. The supercomputer here at Goddard runs SLES (Suse) and I believe other NASA supercomputers run that as well.

Depends on what you want to do with it.

Basically anything else that operates on their hardware infrastructure is ok. Though it may be very challenging to get a useful supercomputer on very old operating systems.

I'm guessing one of the major factors to decide for Linux is this:

"The licensing cost of a custom, self-supported Linux distribution is the same, whether you're using 20 nodes or 20-million nodes."

NetBSD because "Of course it runs NetBSD" :)
The Cray I used to program used an OS called Unicos, which was a variant of Unix.
In recent history, a fair bit of AIX on IBM Power. Further back, lots of other Unix variants and other vendor-specific OSs.
FreeBSD? Netflix runs it on their OpenConnect boxes, but that's about the extent of my knowledge.
Blue Gene runs Plan 9 and Inferno
A few years ago I remember some of them were running BSD derivatives.