Hacker News new | ask | show | jobs
by SEJeff 3140 days ago
Because BSD's SMP support has traditionally been pretty terrible compared to Linux's. They still have a SLAB memory allocator (compared with Linux's default of SLUB which is much better for heavily SMP systems).

Many of the vendors for HPC (I'm looking at you Mellanox) primarily develop and certify their products on Linux. While they might work on BSD ok, you're not going to get the full performance and all of the features on a BSD system. If you paid for Mellanox EDR 100G Infiniband switches and all of the fancy VPI network cards, you want to use them to the fullest performance capable. The vendor tells you to use Linux for that, you use Linux.

TL;DNR: Linux is what the hardware manufacturers overwhelmingly target and work with. HPC users use what vendors support best.

1 comments

Your final line is 100% correct but all your supporting details are not.

HPC is generally a "softball" workload because the code is going to be more sympathetic to the hardware than many other computer usages. Processes will batch allocate a lot of RAM and peg runnable state for a long time.

SMP.. "it depends", again a parallel vector matrix multiply is just going to sit in the runnable state on all the cores and the kernel is pretty irrelevant. There is a lot of junior job stuff left in FreeBSD to move locks around. The VFS is quite bad. In an HPC type workload these things probably wont matter that much unless you see a lot of "system %". They will show up in profiles and are generally also easy to fix. But it's not hard to construct a microbenchmark showing Linux > $else in those areas.

SLUB.. no. What kind of HPC workload is going to care much about this? The Linux allocators are pretty awful at contig kernel memory allocation (see ZFS on Linux). I don't see why UMA would architecturally flop here.

NUMA is a sore point on FreeBSD. It should be usable in 12.0. Isilon and Netflix are paying Jeff Roberson to work on it. Some folks on my team are also doing minor NUMA and locking work, but for commercial CDN workloads.

Mellanox does a pretty stellar job on FreeBSD Ethernet and Infiniband support. Unfair dig at them. I generally prefer Chelsio, but Mellanox has lowest latency which is relevant for HPC.

Awesome response, thanks for taking the time to write it.

SLUB was written by Christoph Lameter when he was at Silicon Graphics for their monster Altix machines. It took Linux hours to boot (with SLAB) on that machine. He wrote SLUB in a fit of brilliance to make Linux suck less on these, of which HPC workloads can most certainly be ran. Just like some of the crazy Cray computers, SGI machines used to own HPC. Note that I work with Christoph in the same office and have discussed this with him in person. Regarding contiguous memory allocation, a lot of serious HPC workloads use huge pages set at boot to defeat this, so that part of Linux's fail is a non-issue (You're entirely right btw). Really awesome to hear about NUMA bits in FreeBSD being improved, and I sufficiently feel hit with a cluebat on it.

The bit from Mellanox was from their engineers (in their Haifa, Israel office before lunch) telling me they build their products for Linux first, and then port to everything else. They care deeply that it works on Linux, and it is nice if it works on other systems but not as important. It wasn't a dig at them, it was what the engineer said to me.