|
|
|
|
|
by Xenoamorphous
893 days ago
|
|
Isn’t the embedding step much slower than clustering? How many documents are you dealing with? For I news aggregator I worked on I disregarded k-means because you have to know the number of clusters in advance, and I think it will cluster every document, which is bad for the actual outliers in a dataset. Agglomerative clustering yielded the best results for us. HDBSCAN was promising but doing weird things with some docs. |
|