|
|
|
|
|
by seanhunter
241 days ago
|
|
For people who are unfamiliar, k-means is a partitioning algorithm that aims to group observations into a specific number (k) of clusters in such a way that each observation ends up in the cluster with the “nearest” mean. So say you want 5 groups, it will make five groups so that every observation is in the group where it’s nearest to the mean. And so that raises the question of what “nearest” means, and here this allows you to replace Euclidian distance with things like Kullback-Leibler divergence (that’s the KL below) which make more sense than Euclidian distance if you’re trying to measure how close two probability distributions are to each other. |
|
To me, the definition of "nearest" is just a technicality.
The real question is: what is K?