Hacker News new | ask | show | jobs
by greatzebu 4431 days ago
I'd say that the two communities here are just people working at opposite ends of a continuum of applications that run on large clusters. The HPC community is all the way at the end with tightly coupled applications, low data to computation ratios, and diverse communication patterns. The big data community is characterized by giant data to computation ratios, highly constrained and regular communication patterns, and loose coupling.

The fundamental problems are similar (fault tolerance, load balancing, scheduling) but the best approaches depend on where you are on that continuum.

1 comments

I see that you talk about "large clusters". In the HPC community there is often made a distinction between clusters and supercomputers, where the latter implies fast interconnect between the nodes, allowing synchronization of data between the steps to be fast. Such fast interconnection is often required for some workloads like weather forecasting, simulation of biomolecules. On clusters without such fast interconnection, it is not possible to parallelize such problem beyond a dozen of processors. Real supercomputers can often be an order of magnitude or more expensive per CPU, but is required for such workloads. For such workloads, it is very important how data is moved around, and that is probably what they meant by data locality.

On the other hand, many other HPC tasks are possible to spread across cluster nodes, and for those tasks clusters are sufficient. In fact you will often be denied access to supercomputers for such workloads, and be told to use a cluster instead.

Many clusters have tight interconnect, yet I don't call them supercomputers.

Supercomputers are more defined by their capabilities and max capacities: they tend to be orders of magnitude larger in their max memory, and their ability to do X, Y, or Z. It's really just a term at this point, not something truly differentiating.

I agree. We have a 128 node cluster where I work (that has infiniband interconnect, etc), but I wouldn't call it a supercomputer. Some of my colleagues, however, have access to machines at various national labs (ORNL, e.g.) that I would call supercomputers. I suppose it's all relative, though. For someone who's only ever developed on a dual-core 2 GHz machine, a 128 multi-core node cluster might be considered the equivalent of The WOPR.
I agree that a 128-node computer would not be called a supercomputer regardless of interconnection, my point was more the other way, that a computer without fast interconnections would still be called a cluster, and not a supercomputer regardless of the number of nodes.