|
|
|
|
|
by derefr
2240 days ago
|
|
Levenstein (edit) distance is fundamentally an information-theoretical concept defined on bitstreams, as insertions/deletions/swaps of individual bits within a stream. It has a lot in common with error-correcting codes, fountain codes, and compression, which all also operate on bitstreams. Any higher-level abstract mention of Levenstein distances (e.g. of Unicode codepoints) is properly supposed to be taken to refer to the Levenstein distance of a conventional (or explicitly specified) binary encoding of the two strings. |
|
A translation of the original article [1] that introduced the concept notes in a footnote that "the definitions given below are also meaningful if the code is taken to mean an arbitrary set of words (possibly of different lengths) in some alphabet containing r letters (r >= 2)".
And if you wish to strictly stick to how it was originally defined, you'd need to only use strings of the same length.
More recent sources [2] say instead "over some alphabet", and even in the first footnote, describe results for "arbitrarily large alphabets"!
[1] https://nymity.ch/sybilhunting/pdf/Levenshtein1966a.pdf
[2] https://arxiv.org/pdf/1005.4033.pdf