| HN Mirror

For power usage, a model like the one used by Parallax's Propeller microcontroller might be interesting: Propeller has 8 cores. The entire thing can be clocked up an down like on a modern x86 chip. But it's also possible to put cores to sleep individually, which reduces power consumption even further.

I'm not sure how good an example Raspberry Pi is, simply because it includes features like a GPU. A computer probably really only needs one of most of that kind of stuff, so it would make more sense to look at the power requirements for a bare ARM chip than for a complete computer built around it.

-- change of subject --

What I've been running into (including hard and repeatedly over the last few days) is that parallelizing workstation-end tasks without killing performance in the process is hard. Shared-memory parallelism in particular is painful, because with too many cooks in the kitchen they'll end up spend more time trying to not pour boiling water on each other than making food. For example, every time you hit a memory barrier all the cores that are working against that memory need to stop and consult the L3 cache or, worse yet, main memory. That introduces an enormous stall (Of course if you're in a situation where non-trivial parallelizing is worth the effort, any stall feels enormous.), so it needs to be avoided as much as possible. . . which tends to not be an easy thing to do if you're doing shared-memory parallelism. Because if it were trivial, then you'd probably have been able to get away with shared-nothing.

Now the "lots of tiny cores" approach gets more interesting when you can get away with a more shared-nothing approach like what the article suggests. But it comes at a big cost, which is that you're going to take a massive hit on the kind of performance you can get on tasks for which parallelism is infeasible, or for which you don't have any programmers who are good enough at parallelization to do it (effectively the same thing). In those situations, you're going to be stuck watching one lone core play "Little Engine that Could" while all the other cores are dozing off like the lazy bums they are.

Meanwhile it solves a problem that I'm not convinced really exists. Time-multiplexing relatively beefy CPUs is pretty much a solved problem. Less so if you need real-time, but for everyday use there's really not much need to segregate processes to different cores when pre-emptive multitasking has been around on consumer systems for decades.