Hacker News new | ask | show | jobs
by killercup 2401 days ago
> Anyone have any ideas about which open source codebases UTF-8 validators exist in?

Rust's std library -- the canonical way to read a text file to a string is to (implicitly) use `std::str::from_utf8`. If I remember correctly, the current implementation doesn't use SIMD specificially but will of course contain vectored instructions if the compiler can select them on the platform you target.

I did a comparison with another SIMD based implementation last year. Maybe it's time to update it: https://github.com/killercup/simd-utf8-check

1 comments

Eh, these types of algorithms aren't the ones that can really benefit from autovectorization of the scalar algorithm.

The whole algorithm basically has to be redesigned from scratch to introduce vectorization: something far beyond the capabilities of the compiler.