Hacker News new | ask | show | jobs
by minimaxir 300 days ago
Typically these algorithms cluster by similarity (either euclidian or cosine).

The density of the clusters tend to have trends. In this case, the "yolk" has a lot of bizarre unicode tokens.