Hacker News new | ask | show | jobs
by stephencanon 4707 days ago
Like my sibling posted, the crazy CISCy instructions aren’t comparable because in general they were no faster than an equivalent sequence of simpler instructions. That’s not the case for permute; there are no “simpler” instructions that let you build an efficient permute. It’s one the fundamental building blocks for efficient vector code -- that’s why it’s shocking that it was added to SSE so late.