Hacker News new | ask | show | jobs
by burntsushi 2239 days ago
> The second problem is that at least the Rust code is decoding UTF-8 every iteration of the inner loop instead of decoding once and saving the result

Indeed. This is a pretty damning difference. The `target` string is being repeatedly UTF-8 decoded where as the same is not true in the Go version. The Go version even goes out of its way to do UTF-8 decoding exactly once for each of `source` and `target`, but then doesn't do the same for the Rust program.

> Implausible benchmark results are almost always an indicator of the incompetence of the person performing the benchmark.

Come on. We can do better than this. Please don't make it personal. We all have to learn things at some point.

1 comments

> Indeed. This is a pretty damning difference. The `target` string is being repeatedly UTF-8 decoded where as the same is not true in the Go version. The Go version even goes out of its way to do UTF-8 decoding exactly once for each of `source` and `target`, but then doesn't do the same for the Rust program.

I'm really not sure that's an issue, utf8 decoding is very, very cheap and it's iterating either way.

It would have to be benched, but I wouldn't be surprised if allocating the caches (at least one allocation per line of input) had way more overhead, especially given the inputs are so very short.

I'm not going to claim Rust's utf8 decoder is the fastest around, but it's very fast.

It's cheap, but not _that_ cheap. It shouldn't be as cheap as just iterating over a sequence of 32-bit integers.

But yes, I did benchmark this, even after reusing allocations, and I can't tell a difference. The benchmark is fairly noisy.

I agree with your conclusion, especially after looking at the input[1]. The strings are so small that the overhead of caching the UTF-8 decoding is probably comparable to the cost of doing UTF-8 decoding.

[1] - https://github.com/christianscott/levenshtein-distance-bench...

> It shouldn't be as cheap as just iterating over a sequence of 32-bit integers.

I wonder if there are any benchmarks about this? Specifically, it feels like in theory iterating utf8 could actually be faster if the data is mostly ascii, as that would require less memory bandwidth, and it seems like the computation is simple enough for memory to be the bottleneck (this is a wild guess, I have horrible intuition about speed of various hardware things). In this particular benchmark this reasoning doesn’t apply, as strings are short and should just fit in cache.

If all you need to do is validate UTF-8, then yes, mostly ASCII enables some nice fast paths[1].

I'm not a UTF-8 decoding specialist, but if you need to traverse rune-by-rune via an API as general as `str::chars`, then you need to do some kind of work to convert your bytes into runes. Usually this involves some kind of branching.

But no, I haven't benchmarked it. Just intuition. A better researched response to your comment would benchmark, and would probably at least do some research on whether Daniel Lemire's work[2] would be applicable. (Or, in general, whether SIMD could be used to batch the UTF-8 decoding process.)

[1] - https://github.com/BurntSushi/bstr/blob/91edb3fb3e1ef347b30e...

[2] - https://lemire.me/blog/2018/05/16/validating-utf-8-strings-u...

Did a quick benchmark: https://gist.github.com/matklad/ebc1acd2fab884f3ff9d96d4175f....

Seems like hypothesis is not wrong, but also is not interesting: the difference is pretty small, and it can be easily dwarfed by the big wins for utf32 if something like auto-vectorization or bounds-check elision kicks in. Also I am not entirely sure that the diff I observe is not due to something completely irrelevant, like one loop ending up on a lucky cacheline or what not.