Hacker News new | ask | show | jobs
by Raphaellll 2441 days ago
I once trained unsupervised character embeddings and used levenshtein distance with embedding cosine as character replacement weights. And it worked better as a similarity metric than soundex
1 comments

Sounds interesting. Can you expand on what you mean by "with embedding cosine as character replacement weights"?
Edit distance allows for insertion, deletion and substitution. Some allow custom cost for substitution (e.g. characters closer together on a keyboard have lower substitution cost than other; for typo normalization). What I describe in my comment is to use the cosine similarity between learned character embeddings as substitution costs.