|
|
|
|
|
by xoranth
665 days ago
|
|
Sure, but how well do they perform compared to vector loads? Do they get converted to vector load + shuffle uops, and therefore require a specific layout anyway? Last time I tried using gathers on AVX2, performance was comparable to doing scalar loads. |
|
Gathers on AVX2 used to be problematic, but assume it shouldn't be the case today especially if the lane-crossing is minimal? (if you do know, please share!)