Hacker News new | ask | show | jobs
by burntsushi 2233 days ago
Note that using bytes is a fundamentally different implementation that will produce different results on non-ASCII input. Using codepoints (or "runes") will better approximate edit distance based on visual characters. (And grapheme clusters would be even better. Although one could put the text in composed normal form to get more mileage out of the rune based algorithm.)