|
|
|
|
|
by brandmeyer
2125 days ago
|
|
Its theoretically possible to run gather as fast as one cache line per cycle instead of one SIMD lane per cycle. I don't think anyone has thrown that much permute hardware at the problem, though. Its only profitable if you believe that scatter and gather do have cache locality even when they don't have regularity. |
|