| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aroden 4332 days ago
	Strikingly similar to the problem of generating a close enough dendrogram over large data sets very quickly. Back in ~2008 I did some undergrad research on the topic to speed DNA analysis. Basically it solves your grouping problem gracefully by attempting to be within a error level at N levels. Since you are attempting N buckets this would solves your problem well. There was a very good review paper on the subject but is not in first page of google and my memory fails me completely on author (also I'm at work). Might not meet your memory or computation constraints, but grouping is such an wide topic.