Hacker News new | ask | show | jobs
by meisel 1883 days ago
How does this compare, speed-wise, to the UTF-8 validation done in simdjson?
2 comments

Check the benchmarks section (https://github.com/rusticstuff/simdutf8#Benchmarks), second table. simdutf8 is up to 28 % faster on my Comet Lake CPU. However with pure ASCII clang does something magical with simdjson and it beats my implementation by a lot. GCC-compiled simdjson is slower all around except for a few outliers with short byte sequences.

The algorithm is the one from simdjson, the main difference is that it uses an extra step in the beginning to align reads to the SIMD block size.

If someone is wanting to make a code comparison, most of the UTF-8 validation in simdjson seems to be nicely summed up in this pull request that sped it up: https://github.com/simdjson/simdjson/pull/993/files