But the shiny object that has my attention at present consists of low-voltage ARM-type chips running on tiny inexpensive systems that can be stacked together to do all kinds of interesting things for a fraction of the power my Intel Xeon uses
This sentiment is repeated very often, but has someone actually done the math (in the case you do temporarily need a lot of processing power)? E.g., the following post estimates the power use of a Raspberry Pi around 2W:
A recent Xeon or Core i7 is many times faster, and has the advantage of providing shared-memory parallelism (as opposed to a cluster of Pi's, where you have to distribute work over a 100MBit network).
Also, when he wants to save power, he shouldn't use a Xeon. Intel Core mobile CPUs, draw a relatively small amount of power as well. E.g. last time I measured my Mac Mini, it used 12W during normal use. And it's actually a usable desktop machine, in contrast to the Raspberry Pi.
For power usage, a model like the one used by Parallax's Propeller microcontroller might be interesting: Propeller has 8 cores. The entire thing can be clocked up an down like on a modern x86 chip. But it's also possible to put cores to sleep individually, which reduces power consumption even further.
I'm not sure how good an example Raspberry Pi is, simply because it includes features like a GPU. A computer probably really only needs one of most of that kind of stuff, so it would make more sense to look at the power requirements for a bare ARM chip than for a complete computer built around it.
-- change of subject --
What I've been running into (including hard and repeatedly over the last few days) is that parallelizing workstation-end tasks without killing performance in the process is hard. Shared-memory parallelism in particular is painful, because with too many cooks in the kitchen they'll end up spend more time trying to not pour boiling water on each other than making food. For example, every time you hit a memory barrier all the cores that are working against that memory need to stop and consult the L3 cache or, worse yet, main memory. That introduces an enormous stall (Of course if you're in a situation where non-trivial parallelizing is worth the effort, any stall feels enormous.), so it needs to be avoided as much as possible. . . which tends to not be an easy thing to do if you're doing shared-memory parallelism. Because if it were trivial, then you'd probably have been able to get away with shared-nothing.
Now the "lots of tiny cores" approach gets more interesting when you can get away with a more shared-nothing approach like what the article suggests. But it comes at a big cost, which is that you're going to take a massive hit on the kind of performance you can get on tasks for which parallelism is infeasible, or for which you don't have any programmers who are good enough at parallelization to do it (effectively the same thing). In those situations, you're going to be stuck watching one lone core play "Little Engine that Could" while all the other cores are dozing off like the lazy bums they are.
Meanwhile it solves a problem that I'm not convinced really exists. Time-multiplexing relatively beefy CPUs is pretty much a solved problem. Less so if you need real-time, but for everyday use there's really not much need to segregate processes to different cores when pre-emptive multitasking has been around on consumer systems for decades.
* One is a total revamp of Operating Systems, so that everything is virtualized.
The first is a hardly a prophesy, because it's so near to being reality, and the article was written on the day of the Intel announcement.
The second is maybe, what, 20 years into the future? I'm not even sure there's a need. Security problems are not technical anymore. They are caused by a breach of trust. Integration between all those different VMs will still be needed. Badware will use this interface too. People like integration. Separating everything into its own VM will hinder that, and customers will not like it.
Except for games and some build cycles, I'm almost never waiting because the CPU has maxed out.
That's just ... Weird. What would "maxing out" even mean for a CPU? Going "fast enough", somehow? Maybe because build cycles is exactly what I do a lot of my time in front of a computer, I really don't think CPU:s are ever going to be "fast enough". Even if just doing "ordinary computing", I often think e.g. browsers and office applications are rather slow.
What I would like to be able to do, instead of -O1, -O2 -O3 flags to a compiler is for the "optimization level" to be measured in seconds. So -O1 would run the optimizer for 1 second and give me the best optimized code it could come up with in that time. -O3600 would let the machine think about it for an hour, using any and all heuristics and empirical tests and then giving me what it had at the end. Pre-release, I might want to run the optimizer for a week.
> Pre-release, I might want to run the optimizer for a week.
That's a dangerous model, as certain classes of bugs would only come out in the super-optimized version, but that version would presumably not get the same amount of testing as the regular builds.
It's amazing how much of that stuff isn't really the CPU's fault, though. For example, buying a solid-state drive and moving all my projects over to it has done things that are almost magical to my build times.
(Granted, my C++ days are behind me so particularly CPU-intensive builds just don't really happen to me anymore.)
There will always be some applications (games, compilers, Photoshop, 3D rendering) that will benefit from fine-grained parallelism. For the rest, being able to run your web browser, DVD ripper, music streamer and IDE on four separate cores is good enough.
This sentiment is repeated very often, but has someone actually done the math (in the case you do temporarily need a lot of processing power)? E.g., the following post estimates the power use of a Raspberry Pi around 2W:
http://www.raspberrypi.org/phpBB3/viewtopic.php?f=2&t=60...
A recent Xeon or Core i7 is many times faster, and has the advantage of providing shared-memory parallelism (as opposed to a cluster of Pi's, where you have to distribute work over a 100MBit network).
Also, when he wants to save power, he shouldn't use a Xeon. Intel Core mobile CPUs, draw a relatively small amount of power as well. E.g. last time I measured my Mac Mini, it used 12W during normal use. And it's actually a usable desktop machine, in contrast to the Raspberry Pi.