|
|
|
|
|
by trsohmers
3621 days ago
|
|
The vast majority of the dynamic parts of program that matter for scheduling (both when it comes to ILP/avoiding hazards within a core and when it comes to handling memory management for our scratchpad based memory system) are due to indeterminate latencies for memory accesses and executing instructions (due to variable length pipelines). Throw in horrible (for determinism) things like out of order execution and and branch prediction and no wonder a compiler can't determine things statically! While we are not really targeting general purpose (though I would say we have the capability to evolve to it in the future) it seems painfully obvious to me where these issues have been in any general-leaning VLIW attempts in the past, and I can't understand the clinging nature to bad architectural decisions in the past by hardware folks 30 years ago that could not imagine the ability of software in the future. </rant> Targeting general purpose from the get go is a bad idea, but it NOT impossible to do efficiently and without sacrificing performance. You just need a well defined and constrained architecture, and a clean way to describe it. |
|
Even in the restricted world of HPC, GPGPUs have been moving from statically scheduled exposed pipeline VLIW machies to more conventional SIMD with caches, virtual memory and branch prediction (no meaningful OoO yet as the large amount of thread parallelism can hide the memory latency).
Also GPGPU have the benefit of having the large, lucrative GPU gaming market to pay for their development. How can a pure HPC machine be competitive in this market? Even for Intel Xeon Phi is more of a prestige project than actually meant to make money.