Hacker News new | ask | show | jobs
by Nvn 4981 days ago
There are a few things you should take into account:

1. You determine the 'dominant' colours to be the centroids of your clusters. The centroid is the mean of the points within the cluster, this mean is not necessarily a colour that is in your image. If you, for example, take a picture divided into four different solid coloured squares, and use this to find the 3 dominant colours it will average 2 (or more) colours. (The same might happen for more complex images with a a lot of contrast).

2. When randomly initializing k-means there is a good chance you'll find one of the local optima, so running it more than once will return different colours. In general it is good practice to run it several times and choose the outcome with the lowest cost.

3. K-means can take a long time to converge; limit the amount of iterations it can do.

These things aside, very cool usage of k-means on image data!

1 comments

IRT #1 good point. Doing a "quick" nearest neighbor to the centroids would help with that I'd bet.