|
|
|
|
|
by corsix
1416 days ago
|
|
The math is fiddly to get right, but (as the author) I'd suggest that the disadvantage is very tight coupling to the CPU implementation: the interleaving is based on the relative speeds of the two methodologies, so if the relative speeds of the two methodologies drastically changes on a future CPU implementation, this _particular_ interleaving could end up _slower_ than either methodology on its own. |
|
My personal thoughts is that we should design a CPU where these kinds of pipelines / executions are more explicit, and then write magic compilers that can pull parallelism out of our programs to be in the more explicit parallelism form that this new CPU would prefer. You'd still be tied to an architecture, but moving to a new architecture (ie: 2x SIMD pipelines in the future) would be as easy as recompiling, in theory.
Then I realized that I've reinvented VLIW / Intel Itanium. And that's a silly, silly place and we probably shouldn't go there again :-p
--------
The MIMD (multiple-instruction multiple data) abilities of modern CPUs are quite amazing in any case, and its always fun to take advantage of it. Even with a singular instruction stream like in this example, it is obvious that modern CPUs have gross parallelism at the instruction level.
Its a bit of a shame that these high-performance toys we write are kind of unsustainable... requiring in depth assembly knowledge and microarchitecture-specific concepts to optimize (that often become obsolete as these designs inevitably change every 5 years or so). Then again, its probably a good idea to practice writing code at this level to remind us that the modern CPU is in fact a machine with defined performance characteristics that we can take advantage of...