|
|
|
|
|
by arcticbull
2239 days ago
|
|
Thanks for following up. Just as an FYI, there's a few bugs in your implementation, the most obvious one is the use of ".len()" in a number of places interspersed with ".chars().count()". These two return different values. ".len()" returns then number of UTF-8 bytes in the input string, which for ASCII is the same as ".chars().count()" obviously, but if you do attempt any Unicode characters, your function won't work. ".chars()" provides Unicode Scalar Values (USVs) -- which is a subset of code points, excluding surrogate pairs [1]. Note also this is not the same as a Go rune, which is a code point including surrogate pairs. Secondly, you re-implemented "std::cmp::min" at the bottom of the file, and I'm not sure if the stdlib version is more optimized. Lastly, well, you caught the issue with repeated passes over the string. I've fixed the issues if you're curious: https://gist.github.com/martinmroz/2ff91041416eeff1b81f624ea... Unrelated, I hate the term "fake news" as it's an intentional attempt to destroy the world public's faith in news media. It's a cancer on civilized society. Somewhere your civics teacher is crying into some whiskey, even though of course you're joking. [1] http://www.unicode.org/glossary/#unicode_scalar_value |
|
Based on some cursory research, the go version differs in a more subtle way too. A Rune is a Code Point, which is a superset of the Rust "char" type; it includes surrogate pairs.