|
|
|
|
|
by anonymoushn
961 days ago
|
|
That's absolutely wild. If you really only needed vpshufb, the throughput is the same in terms of values, because there are twice as many values per register and you get to retire half as many instructions, but it takes a bunch more instructions to combine the two inputs and apply a LUT of 256 values :( |
|