|
|
|
|
|
by pbsd
4710 days ago
|
|
Oh, I missed that. That makes things trickier, but I think we can still get away with something like vmovdqu xmm7, [rdi + rax + 5 - 1]
vpinsrb xmm7, xmm7, [rdi + rax + 0], 0
without too much of a performance penalty. The adjustments to offsets then can be put into the shuffle tables, so there should be no further significant performance loss. |
|