Hacker News new | ask | show | jobs
by jpgvm 3140 days ago
HPC may look like COTS gear but it's not.

BSD doesn't have drivers for Infiniband and other HPC interconnects. Nor does it have client drivers (let alone server implementation) for Lustre which is the distributed filesystem used by most super computers.

I imagine MPI support on BSD is also likely non-existent. Then there is the matter of accelerator support, i.e NVidia GPUs and Intel Xeon Phi.

It's not to say that some vendor couldn't reasonably build a BSD based supercomputer, it's just highly unlikely given how much stuff is missing.

3 comments

FreeBSD does support Infiniband: https://wiki.freebsd.org/InfiniBand

This page mentions that Mellanox has provided work on this. Also storage vendors uses external infiniband stacks on FreeBSD for many years (I got my first Isilon cluster something like 8 years ago, and it was using an IB backend, with a forked FreeBSD 7 kernel iirc) and are stable and widely deployed.

No matter what your argument is, someone will always come up with the counter argument for a particular example, missing the general point.
I need to print this on a T-shirt
I didn't realise Isilon used BSD (was aware it was IB based) or that IB drivers work well on BSD, that is cool.

That said, the primary platform is Linux and HPC is a very demanding workload. Unless I had a lot of time to invest in BSD kernel development I would stick with Linux.

That is putting aside Lustre too which is usually a non-negotiable requirement for HPC.

AFAIR infiniband support in FreeBSD has been there for a very long time.
Considering how much longer we've had supercomputers than 8 years maybe it was simply a matter of Linux got there first and had inertia.
This is the correct answer (especially Infiniband - and Aries on Crays)

Also NUMA is very important on supercomputers, and it works well on Linux.

The other thing worth noting is the much better support IBM has for Linux on PowerPC (2 in the top 10). I think Sunway (most powerful in the world) is a Linux shop too.

Is it egg and chicken matter? Vendors don't write driver for BSD and BSDs lack of users because lacks of drivers. Honestly I hope I can run an OpenBSD and install whatever driver for my plugged in devices, both for my personal and production servers.
Vendors do write drivers for bsd, they just don't generally give them back to the project. Agree or disagree, they generally have a ton of time and money invested in their drivers and don't want to give them away to competitors.
Preciſely why we have copyleft.
its very much chicken and egg. Cray used Linux because all the customers were using linux. There was never a technical meeting discussing their relative merits. The Tera MTA project was actually BSD based, because it was from an age where the BSD project had clear technical superiority (and they were probably worried about complying with the GPL)

As others have mentioned there was a Mellanox stack in Free circa 2005 that I worked with. It was used at Isilon (BSD based) in production.

There really isn't a technical discussion here at all, when an overwhelmingly large part of your userbase uses X, it would be pretty stupid to only support Y, and probably not defensible to support X and Y

FreeBSD still does have an Infiniband stack and Isilon still uses it.
OpenBSD's performance is atrocious. Scrolling the browser laggy and unresponsive on my ThinkPad x220, and closing tabs sometimes results in multiple-second freezes.

That's maybe good for a router but simply not HPC material.

Nobody wants to use OpenBSD HPC. Everybody wants to use Dragonfly BSD HPC, if not many drivers were missing. Dragonfly MPI outperforms Linux, Linux just has more HW support and a bit better TCP/IP stack.
Okay, if that is the case, illumos and therefore SmartOS has long had stable Infiniband and MPI support and coming from Solaris is famous for his excellent scalability on very large number of processors, as well as long tradition of HPC. Why isn’t it used for HPC then?
Sun (you might add Oracle as well, but I think at that point whatever they could have done was too little too late) mismanagement, and Linux was/is better in many respects? It wasn't called "Slowlaris" for nothing?

And it's not like Linux is somehow famous for poor scalability, unless you're talking about the 1990'ies. Yes, back in the 1990'ies it was certainly much worse than Solaris. But for the 2.6 and subsequent releases SGI and others put a lot of work into improving it. SGI at some point sold 4096-way (might even have been 4096 cores and 8192 hw threads?) single-image supercomputers running Linux, which AFAIK is bigger than anything Solaris has been deployed on.

That being said, most HPC systems consist of 1 or 2-socket nodes connected via a network, so the kernel scaling to such extreme systems isn't that relevant in the vast majority of deployments.

“Slowlaris” days were 15 years ago with Solaris 8. Meanwhile, Solaris and illumos (and therefore SmartOS) are the only operating systems I know of which provide CPU bursting. If you go put Linux and SmartOS on the same intel CPU based hardware, SmartOS is likely to beat it in performance. What might have been 15 years ago has long since (2005 with Solaris 10) not been the case.
Sparc machines were not as good at number crunching as Power so Sun wasn't as well-represented in the list as IBM, and Solaris wasn't as heavily used as AIX.
I’m specifically referring to running HPC on SmartOS, which runs only on intel and AMD, with full support for intel only. My question is why isn’t it used for HPC now since it provides CPU bursting, not why it wasn’t used in the past. Fair disclosure: I grew up on SGI and HPC, I know what was before.