Hacker News new | ask | show | jobs
by igodard 3167 days ago
The tool chain does hoisting and if-conversion with wild abandon. That code becomes {x = cond ? a+b : a*b}, and both expressions are evaluated in parallel. The conversion is a heuristic; if you have tracing data for the branch then it might not convert. However, a miss-predict is a lot more expensive than a multiply so the tracing has to be pretty skewed to be worth the branch.

The conversion does increase the latency of getting the value of x. If there's nothing else to do then the tool chain will insert explicit nops to wait for the expression. The same stalls will exist on other architectures for the same code, just not visibly in the code. It happens that making the nops explicit is faster than a stall; you can idle through a nop with no added overhead, but you can't restart a stall instantaneously.