| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Raphaellll 2441 days ago
	I once trained unsupervised character embeddings and used levenshtein distance with embedding cosine as character replacement weights. And it worked better as a similarity metric than soundex

1 comments

ahnick 2441 days ago

Sounds interesting. Can you expand on what you mean by "with embedding cosine as character replacement weights"?

link

Raphaellll 2440 days ago

Edit distance allows for insertion, deletion and substitution. Some allow custom cost for substitution (e.g. characters closer together on a keyboard have lower substitution cost than other; for typo normalization). What I describe in my comment is to use the cosine similarity between learned character embeddings as substitution costs.

link