Hacker News new | ask | show | jobs
by brucehoult 84 days ago
Of course it is. Emulating parallel operations on 4 or 8 or 16 or 32 elements one at a time using scalar instructions is expected to be slow.