Hacker News new | ask | show | jobs
by dw-im-here 1228 days ago
predication and queues can go a long way
2 comments

Out of three VLIW architectures I have looked into, 2x (Elbrus and Itanium) rely on the predication heavily. i860 does not have instruction predicates.

Predication places the burden of creating optimal instruction bundles AND the correct hinting via the use of predicates on the compiler. If stars aligned, the code could perform blazingly fast. It turned out that aligning the stars in an optimal space time sequence was an arduous task due to the actual hints only being available at the runtime.

Which is where JIT has delivered well (and cheaper!) without requiring a radically different VLIW design.

Fundamentally it seems though that more information is available at run time? You may get partway there in the compiler, but assuming you have sufficient transistor budget, it seems more optimal to do reordering in the CPU.
The runtime doesn't know all that much, though. All it has is a single instruction flow, that it can extract fine-grained parallelism from and try to speed up further via speculation. Nothing whatsoever about other work that may be scheduled in when the processor is stalled by memory, other than via SMT. Nothing about priorities or coarse-grained dependencies among work units. So there's a lot of parallelism that's left on the table, and a lot of speculated work that might just be wasted.
> The runtime doesn't know all that much […]

If we are talking about JIT, yes, it does, for it instruments the runtime, gathers the information about hot code paths and performs the in-place optimisation. Think of the profile guide compile time optimisation having been carried over into the runtime.