|
|
|
|
|
by pcwalton
2119 days ago
|
|
Don't GPUs optimize gathers within the same cache line to effectively be a single fetch from memory and then shuffle? I would assume that's the purpose of VPGATHERDD: not so much for a vector of addresses like (0x1000, 0x2000, 0x3000, 0x4000), where there's no alternative other than to issue 4 loads, but rather for a vector of nearby addresses like (0x2010, 0x2004, 0x2008, 0x2000), where the CPU can coalesce the fetches into one (like PSHUFD with a memory operand does). Gather instructions are especially good when your addresses are usually in the same cache line, but don't have to be—stuff like mipmapped texture lookups in fragment shaders. |
|