| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sushshshsh 2186 days ago
	Thank you for this helpful info. For comparison's sake, say that you wanted to make babby's first super computer in your house with 2 laptops. That is to say, each laptop is a single core x86 system with its own motherboard and ram and ssd, and they are connected to each other in some way (ethernet? usb?) What software would one use to distribute some workload between these two nodes, what would the latency and bandwidth be bottlenecked by (the network connection?) and what other key statistics would be important in measuring exactly how this cheap $400 (used) set up compares to price/watt/flop performance for top 500 computers?

5 comments

dekhn 2186 days ago

You could use MPI and OpenMP. I got my start building an 10megabit ethernet cluster of 6 machines for $15K (this would have been back in ~2000). It only scaled about 4X using 6 machines, but that was still good enough/cheap enough to replace a bunch of more expensive SGIs, Suns, and HP machines.

Where the bottleneck is depends entirely on many details of the computations you wanted to run. IN many cases, you can get trivial embarassing parallelism if you can break your problem into N parts and there doesn't need to be any real communication between the processors running the distinct parts. In that case, memory bandwidth and clock rate are the bottleneck. but if you're running something like ML training with tight coupling, then the throughput and latency of the network can definitely be a problem.

link

floren 2186 days ago

The thing to keep in mind about supercomputers is that they are designed for particular applications. Nuclear weapons simulation, biological analysis (can we run simulations and get a vaccine?), cryptanalysis. These applications are usually written in MPI, which is what coordinates communication between nodes.

If you want to play with it at home, connect those laptops to an ethernet network and install MPI on them both--you should be able to find tutorials with a little web searching. Then you could probably run Linpack if you felt like it, but if you wanted to learn a little more about how HPC applications actually work, you could write your own MPI application. I wrote an MPI raytracer in college; it's a relatively quick project and, again, you can probably find a tutorial for it online.

Edit: Your cluster is going to suck terribly in comparison to "real" supercomputers, but scientists frequently do build their own small-scale clusters for application development. The actual big machines like Sequoia are all batch-processing and must be scheduled in advance, so it's a lot easier (and cheaper, supercomputer time costs money) to test your application locally in real-time.

link

gnufx 2186 days ago

Summit and Sierra, for instance, actually run a fair range of applications fast, though Sierra is probably targetted mainly at weapons simulation-type tasks. A typical HPC system for research, e.g. university or regional, has to be pretty general purpose.

link

timthorn 2186 days ago

Step by step instructions for building an Arm-based MPI cluster using Raspberry Pis: https://epcced.github.io/wee_archlet/

link

noir_lord 2186 days ago

If you want to get experience of working with higher node counts without breaking the bank, people do case kits for raspberri pies so you can build your own cluster.

For actual computing a modern higher end processor/server will murder it but its' closer to the real world of clusters than anything (so much so that there is a company that does 100+ pi node clusters for super computing labs to test on, you can't obviously run scientific workloads but it's cheaper than using the real machine as well).

https://www.zdnet.com/article/raspberry-pi-supercomputer-los...

link

gnufx 2186 days ago

If you want to understand distributed-memory parallel performance you're probably better off with a simulator, like SimGrid. I don't know what bog-standard hardware you'd need to get a typical correct balance between floating point performance, memory, filesystem i/o, and general interconnect performance otherwise. No toy system is going to teach you about running a real HPC system either -- you really don't want the fastest system if it's going to fall over every few hours or basically fall apart after a year.

link

corford 2186 days ago

For software, I know https://en.wikipedia.org/wiki/HTCondor is used quite frequently in academia for distributed workloads.

link