Hacker News new | ask | show | jobs
by jamesaross 3544 days ago
Consider that many instruction and data caches are at the 16-32 KB scale. It's obviously a big criticism of the microarchitecture but you have a linear tradeoff between number of cores and available core memory. One core with 64 MB of memory seems less useful than 1024 cores with 64 KB of memory each (which can directly access all other core memory). But 65,536 cores with 1KB of memory each doesn't sound very useful either.
3 comments

Thanks for articulating. As you know, there is no right answer as it depends on workload. Now if we could only build a specific chip for every application domain....
In fact, you have two trade-offs. One is what you said - that for a fixed amount of memory, the more cores, the less memory you have per core. The second trade-off is the transistor budget - the more space you use for cores, the less space you have left for memory.
The third trade off is cycle time; the larger the memory, the longer it takes to access it. This is why L1 caches are typically 16-64 KiB and despite that access is typically 2-3 cycles. However, 3+ cycles is difficult to hide in an in-order processor like this.
> But 65,536 cores with 1KB of memory each doesn't sound very useful either.

You've just described the general architecture of the Connection Machine[0], a late 80's early 90's era supercomputer that was used for modeling weather, stocks, and other items. It was fairly useful in it's time.

[0]https://en.wikipedia.org/wiki/Connection_Machine