|
|
|
|
|
by ashvardanian
859 days ago
|
|
It’s mostly coming from using the Arm NEON intrinsics, not much magic. While working on the library, I was shocked to see how under-vectorized LibC is on Arm. A lot of improvement potential beyond strings. Amazon, Microsoft, Nvidia, Ampere, Apple, Qualcomm, and all the other Arm-based CPU vendors should really consider investing more into the ecosystem. The hardware is very capable, they shouldn’t be losing against x86 in so many benchmarks… |
|
Implementation effort and maintanence is by several factors larger than usual "good enough" scalar implementation.