|
|
|
|
|
by moonchild
1372 days ago
|
|
'Tail handling' in general is an annoying aspect of simd. Masks are great, but no panacea--in particular, if you unroll, then you cannot take care of the tail with a single masked instruction. There are various solutions to this. I favour overlapping accesses, where that's feasible (following a great deal of evangelism from mateusz guzik); a colleague uses a variant of duff's device; you can also just generate multiple masks. I would expect the linked code is just intended as a quick poc, so it does not bother to be optimal. |
|
(Overlapping is indeed cool where it works - idempotent operations.)