|
|
|
|
|
by kev009
3140 days ago
|
|
Your final line is 100% correct but all your supporting details are not. HPC is generally a "softball" workload because the code is going to be more sympathetic to the hardware than many other computer usages. Processes will batch allocate a lot of RAM and peg runnable state for a long time. SMP.. "it depends", again a parallel vector matrix multiply is just going to sit in the runnable state on all the cores and the kernel is pretty irrelevant. There is a lot of junior job stuff left in FreeBSD to move locks around. The VFS is quite bad. In an HPC type workload these things probably wont matter that much unless you see a lot of "system %". They will show up in profiles and are generally also easy to fix. But it's not hard to construct a microbenchmark showing Linux > $else in those areas. SLUB.. no. What kind of HPC workload is going to care much about this? The Linux allocators are pretty awful at contig kernel memory allocation (see ZFS on Linux). I don't see why UMA would architecturally flop here. NUMA is a sore point on FreeBSD. It should be usable in 12.0. Isilon and Netflix are paying Jeff Roberson to work on it. Some folks on my team are also doing minor NUMA and locking work, but for commercial CDN workloads. Mellanox does a pretty stellar job on FreeBSD Ethernet and Infiniband support. Unfair dig at them. I generally prefer Chelsio, but Mellanox has lowest latency which is relevant for HPC. |
|
SLUB was written by Christoph Lameter when he was at Silicon Graphics for their monster Altix machines. It took Linux hours to boot (with SLAB) on that machine. He wrote SLUB in a fit of brilliance to make Linux suck less on these, of which HPC workloads can most certainly be ran. Just like some of the crazy Cray computers, SGI machines used to own HPC. Note that I work with Christoph in the same office and have discussed this with him in person. Regarding contiguous memory allocation, a lot of serious HPC workloads use huge pages set at boot to defeat this, so that part of Linux's fail is a non-issue (You're entirely right btw). Really awesome to hear about NUMA bits in FreeBSD being improved, and I sufficiently feel hit with a cluebat on it.
The bit from Mellanox was from their engineers (in their Haifa, Israel office before lunch) telling me they build their products for Linux first, and then port to everything else. They care deeply that it works on Linux, and it is nice if it works on other systems but not as important. It wasn't a dig at them, it was what the engineer said to me.