|
|
|
|
|
by burntsushi
2241 days ago
|
|
If all you need to do is validate UTF-8, then yes, mostly ASCII enables some nice fast paths[1]. I'm not a UTF-8 decoding specialist, but if you need to traverse rune-by-rune via an API as general as `str::chars`, then you need to do some kind of work to convert your bytes into runes. Usually this involves some kind of branching. But no, I haven't benchmarked it. Just intuition. A better researched response to your comment would benchmark, and would probably at least do some research on whether Daniel Lemire's work[2] would be applicable. (Or, in general, whether SIMD could be used to batch the UTF-8 decoding process.) [1] - https://github.com/BurntSushi/bstr/blob/91edb3fb3e1ef347b30e... [2] - https://lemire.me/blog/2018/05/16/validating-utf-8-strings-u... |
|