Hacker News new | ask | show | jobs
by bogomipz 3173 days ago
>"back in the early 2000's when "clusters" and "NUMA SMP machines" were competing with each other"

Did you mean Numa vs SMP? Or something else maybe? How can a machine be a NUMA SMP? NUMA and SMP are fundamentally different architectures.

2 comments

No, I was thinking non-uniform memory architecture, which is to say an SMP machine where the "speed" at which you can access RAM is dependent on the core or 'thread' from which you accessed it. Lack of memory uniformity was the compromise to achieve larger effective address spaces and "simple" programming.

Today on a 24/48 core dual socket server you'll see the same sorts of thing, having a core using memory on the 'other' physical chip's memory bus will impact the overall performance significantly.

In the 2000's, before multi-core chips were a thing, there were two major camps, the 'super computer' camp, and the 'cluster' camp.

The 'super computer' camp insisted on cache coherent memory between all of the cores or threads. You got these very expensive fabrics from people like cray that would snoop access to memory from the cores and send coherency messages around to insure that if someone wrote something in to memory somewhere, everyone else's L1 or L2 cache got the message to invalidate what they were holding (shoot downs). These machines are very expensive and take months to build.

The 'cluster' camp said, "We can use a network fabric and just parameterize shared memory usage." So they put together independent machines connected by a network fabric and no cache coherency protocol. If you wanted to use shared memory you could build something like memcached and wrap your access with network calls. With that architecture even if it took twice as many cores to do what you wanted to do, the price of the machine was one tenth what it was for the big SMP machine.

For something that was trivially parallellizable like internet search or serving up web sites, that was a much more cost effective way to go. When people started doing stuff that they previously used 'super computers' and big SMP machines for on these Linux clusters it became a sort of race to pull apart these problems into "shared nothing" clusters.

>"Lack of memory uniformity was the compromise to achieve larger effective address spaces and "simple" programming."

Oh interesting. Might you have any links or suggested reading on this discussion and eventual compromise?

>"In the 2000's, before multi-core chips were a thing, there were two major camps, the 'super computer' camp, and the 'cluster' camp."

Is the cluster camp Beowulf then basically?

>"You got these very expensive fabrics from people like cray that would snoop access to memory from the cores and send coherency messages around to insure that if someone wrote something in to memory somewhere, everyone else's L1 or L2 cache got the message to invalidate what they were holding (shoot downs)."

Is this the MESI protocol then?

Thanks.

EDIT I just saw the link below about CCNUMA.

A big category of cluster-friendly HPC cluster codes works by running in lockstep on all the nodes. Apparently it works quite well for things like weather forecasting where the problem is naturally divided into a grid but there is still significant communication needed between grid tiles. So it's quite different from memcached type things.

https://www.cs.fsu.edu/~engelen/courses/HPC/Synchronous.pdf

A quick Google search after reading this comment of yours gave me this → http://www.google.com/patents/US5887146
Ah yes the CCNUMA patent. This bit is the relevant part:

However, SMP systems suffer disadvantages in that system bandwidth and scalability are limited. Although multiprocessor systems may be capable of executing many millions of instructions per second, the shared memory resources and the system bus connecting the multiprocessors to the memory presents a bottleneck as complex processing loads are spread among more processors, each needing access to the global memory. As the complexity of software running on SMP's increases, resulting in a need for more processors in a system to perform complex tasks or portions thereof, the demand for memory access increases accordingly. Thus more processors does not necessarily translate into faster processing, i.e. typical SMP systems are not scalable. That is, processing performance actually decreases at some point as more processors are added to the system to process more complex tasks. The decrease in performance is due to the bottleneck created by the increased number of processors needing access to the memory and the transport mechanism, e.g. bus, to and from memory.

Alternative architectures are known which seek to relieve the bandwidth bottleneck. Computer architectures based on Cache Coherent Non-Uniform Memory Access (CCNUMA) are known in the art as an extension of SMP that supplants SMP's "shared memory architecture." CCNUMA architectures are typically characterized as having distributed global memory.

Is this clear cut in the terminology? A plausible definition would also be that it's still symmetric MP if remote memory access has non-uniform performance - since the nodes and their memories are symmetrical. After all, you get that just with caches and 2 sockets plugged to the same DRAM.

The historical opposite of SMP used to be asymmetric multiprocessors in the heterogenous sense - different kinds of processors, for example scalar/vector/io processors. Or for a modern day take, ARM SoCs with little low-power cores and faster & more power hungry cores, and GPUs thrown in for good measure.

>"Is this clear cut in the terminology?"

I think in the context that OP was referring to "early 2000s" which presumably means the introduction of Opteron and HyperTransport then yes I believe the NUMA vs SMP distinction would be pretty clear.