|
|
|
|
|
by jrfinkel
5171 days ago
|
|
Merging is pretty simple, and could probably use a little more TLC. The way we do it is that when we get a new document in the system, if the similarity score is above some threshold for documents in two different clusters we will consider merging those clusters. We then make the yes/no decision by comparing random documents from both clusters and averaging the scores, but the threshold we use here is a bit lower than the non-merging decisions (since we have the additional information of this new document doing a good job of linking the clusters). |
|