|
|
|
|
|
by mmozeiko
1372 days ago
|
|
It is there to process last <32 elements. The vectorized loop processes up to 32 elements per iteration. The iteration does not happen if there are less than 32 elements left, because it wants to load 32 bytes as input. This is very typical in vectorized loops - process N elements per iteration and second loop that does tail of <N elements. |
|