| There are few issues with Itanium-like architectures. The first thing to point out is that the dynamic filling of the execution units in superscalar hardware will always do no worse than whatever a pure-compiler solution can do, and will very frequently do better. Hardware can take advantage of dynamic opportunities, such as the ability to fill execution slots from code both before and after a branch (or even across function boundaries!), or being more responsive to instructions with data-dependent execution times. Yes, this does take not-insignificant amounts of hardware. But given the limitations of what compilers can statically do, it's not clear that you can put the savings to better use. The second issue is that such an arrangement usually ends up with the hardware encoding microarchitectural details into the ISA. And when you do that, and you desire to change microarchitecture, you're stuck with either changing the ISA and dealing with attendant issues, or you have to add the hardware that you're theoretically saving in the first place. On top of this, you're struck with practical performance being driven by the availability and adoption of sufficiently smart compilers, which is largely out of your control. It's worth noting that you can ameliorate these issues to a larger degree if you restrict your inputs to a more structured subset of possible programs, i.e., you try to build an accelerator instead of a general-purpose CPU. And that's why you see more interesting architectures come out in the accelerator space. But for most general-purpose programs, you're not really going to do better than modern superscalar architectures, even with all the space and power they consume. |