| It's worth pointing out that this is what most people think of as the traditional "Linux Operating System" but sans the Linux kernel. Also, there are a few components on these supercomputers that you won't find on a typical workstation or cluster machine. First, the kernel itself is frequently very small, lightweight, and much closer to what you would find on an embedded system than on a traditional desktop computer. That's because once the program on a supercomputer is loaded, the kernel's job has been mostly to get out of the way. This isn't to say that you don't find the Linux kernel on these supercomputers, it just isn't as common as you would think from reading this piece. Fear not, many pieces of these operating systems are still open source. Here's IBM's fusedOS prototype:https://github.com/ibm-research/fusedos. Second, many of these computers primarily only run code in C, C++, Fortran, and Python. These tend to be the only major languages in play on the HPC machines, with acceleration frameworks such as OpenMP, OpenCL, and CUDA playing major roles. Finally, everything is glued together with MPI, a high-level (at least it was in the 90s) abstraction for scientific programming that maps down to very high-performance networks designed to help scientific codes "scale", that is, run effectively when millions of cores are simultaneously engaged. These are beautiful machines producing important science, and the GNU/Linux operating system plays an incredibly important role in both their implementation and culture. |
Most are just slightly spruced up commodity server hardware running Linux. I'm not sure if this is what you're suggesting, but they don't run C/Fortran/whatever on bare metal. They're run by the OS on the compute node just like a normal OS process, except that tasks are dispatched to compute nodes by a central cluster manager. Processes running in a gang communicate via MPI to share data, though coprocessors are also pretty popular as well so you see a lot of communication between the host processor and a coprocessor too. Titan and Tianhe both actually have most of their compute power in the coprocessors (Xeon Phi and Nvidia Tesla, respectively), but they're still arranged in a master-slave arrangement just like if you bought a Phi or Tesla and stuck it in a spare PCI-E slot. They use plain old PCI-E, too. The Cray XT/XE series (a popular model of which Titan is an example) is basically just really nice blades with integrated cooling and a network backbone in a custom cabinet and possibly coprocessors attached to each blade. You could just as easily run Windows XP and play Minesweeper on each blade if you really wanted to, except maybe for some driver issues. The most foreign thing is probably the network backbones, where fabric architectures like Infiniband are popular.
They're also not limited to specific programming languages. In truth, you can run whatever you want if someone has paid the bill for your resource allocation. I watch people run MATLAB on large clusters all the time, which hurts me because it's so damned inefficient. That said, Fortran and C++ comprise the overwhelming majority of large and computationally taxing codes. Just because all that power is there doesn't mean that all of the users take proper advantage of it. One of the larger calculations run on Titan that I know of (Denovo, a nuclear reactor simulation code) didn't even use the GPUs, only the CPUs. Making codes that can take advantage of GPU processing ergo Titan and its predecessor Jaguar has been a major project at the DOE, with libraries like Trilinos being developed to make it easier on scientists, many of whom are only computer programmers as a secondary concern.
The setup you're describing used to be how it was until up to maybe 10 years ago and there are still systems in the top 500 that work like that. Probably some new ones being built, too. But what I've described is what seems to be in fashion these days mostly and the machines I use are all like that. I've heard mumblings about FPGA coprocessors being the Next Big Thing, but we will see.