|
|
|
|
|
by ben-schaaf
49 days ago
|
|
For a pretty small N I've found that less clever can be quite a bit faster. I'd try a linear search - possibly SIMD if you can change the data format to struct-of-arrays. An adaptive approach that uses linear search up to a certain N can also yield some benefit. |
|
SIMD is tricky even with SoA because there is significant latency going between the general registers and the vector units, plus arm little cores can be configured to share a vector unit with another core.