Hacker News new | ask | show | jobs
by mpyne 4818 days ago
Sometimes there are "canonical forms" for these operations depending on what chip is being targeted, where the hardware can automatically break data dependencies and improve hardware-level parallelism, as long as the instruction is encoded in the right form.

I don't know that this is necessarily the reason here, but it's one possible explanation.