The whole "stick SIMD into it and compare it with the stdlib" thing is a really common learning trope. Feels more like a portfolio filler, albeit probably not a useless one.
We also compare with state of the art platform-specific code. The interesting thing here is that they turn out to be slower than our approach using portable intrinsics (github.com/google/highway) :)