Hacker News new | ask | show | jobs
by anonymoushn 961 days ago
That's absolutely wild. If you really only needed vpshufb, the throughput is the same in terms of values, because there are twice as many values per register and you get to retire half as many instructions, but it takes a bunch more instructions to combine the two inputs and apply a LUT of 256 values :(