| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by arcticbull 2243 days ago

There's a bunch of issues with the Rust implementation, not least that the initial condition where source or target lengths are zero, it returns the number of UTF-8 bytes of the other, but all other computations are performed in terms of Unicode chars -- except at the end: `cache[target.len()]` which will return the wrong value if any non-ASCII characters precede it.

Further, each time you call `.chars().count()` the entire string is re-enumerated at Unicode character boundaries, which is O(n) and hardly cheap, hence wrapping it in an iterator over char view.

Also, re-implementing std::cmp::min at the bottom there may well lead to a missed optimization.

Anyways, I cleaned it up here in case the author is curious: https://gist.github.com/martinmroz/2ff91041416eeff1b81f624ea...