|
|
|
|
|
by siboehm
1264 days ago
|
|
Author here: Seems like a good trick! Though won't this affect shared memory alignment and make me loose those LDS.128 instructions? Or do these not require alignment? There's so little good docs on SASS. In general I'm still confused about whether vectorized load instructions (LDS.128) necessarily lead to bank conflicts or not. My impression was that consecutive 32b floats get mapped to different banks, so to avoid conflicts I'd want the warp to load 32*32b consecutive elements at each step. |
|