|
|
|
|
|
by otherjason
865 days ago
|
|
Whether the technique described here will actually be faster is pretty application-dependent. The problem is that, on x86, shuffle instructions are the bottleneck for many algorithms (at least the type that I often work with). Storing constants this way requires adding an extra shuffle each time that you need to broadcast one of the constants back to a vector register, which exacerbates the bottleneck. In these cases, I’ve found that light spilling to the stack actually performs better. |
|