Hacker News new | ask | show | jobs
by renonce 1114 days ago
What is the difference between a supercomputer and a million ordinary servers connected together?
3 comments

Typically:

  1. Low latency network, 1-2us.  Most servers can't ping their local switch that quickly, let alone the most distant switch for 1M nodes
  2. High bandwidth network, at least 200gbit
  3. A parallel filesystem
  4. Very few node types.
  5. Network topology designed for low latency/high bandwidth, things like hypercube, dragonfly, or fat tree.
  6. Software stack that is aware of the topology and makes use of it for efficiency and collective operations, 
  7. Tuned system images to minimize noise, maximize efficiency, and reduce context switches and interupts.  Reserving cores for handing interrupts is common at larger core counts.
Simplicity of the programming model, basically, though in the end it all just comes down to bandwidth and latency.
Communication speed / latency is a big one. Sometimes it matters how quickly extremely large volumes of data can be sent between cores.