Hacker News new | ask | show | jobs
by samg_ 5479 days ago
I've been playing with some clustering stuff in my free time for the past few months.

What I've found is that the problem seems to get a lot more reasonable if you know how many clusters there are.

K-Means requires this information, but afaict agglomerative techniques don't. I wonder why this tool's agglomerative clustering method requires the number of clusters as an argument.

1 comments

You're right that agglomerative clustering (unlike K-means) does not inherently need to know the # of clusters in advance. However, it still needs some sort of termination criterium, and # of clusters is one possible criterium.

Since lsm operates in a transformed space, other commonly used criteria like cluster distance may not be as convenient for the user to express.