|
|
|
|
|
by burntsushi
2239 days ago
|
|
It's cheap, but not _that_ cheap. It shouldn't be as cheap as just iterating over a sequence of 32-bit integers. But yes, I did benchmark this, even after reusing allocations, and I can't tell a difference. The benchmark is fairly noisy. I agree with your conclusion, especially after looking at the input[1]. The strings are so small that the overhead of caching the UTF-8 decoding is probably comparable to the cost of doing UTF-8 decoding. [1] - https://github.com/christianscott/levenshtein-distance-bench... |
|
I wonder if there are any benchmarks about this? Specifically, it feels like in theory iterating utf8 could actually be faster if the data is mostly ascii, as that would require less memory bandwidth, and it seems like the computation is simple enough for memory to be the bottleneck (this is a wild guess, I have horrible intuition about speed of various hardware things). In this particular benchmark this reasoning doesn’t apply, as strings are short and should just fit in cache.