|
|
|
|
|
by zintinio5
3134 days ago
|
|
Like everything else, depends on your use-case. I have personally used TF-IDF vectors and token sets with Cosine and Jaccard distances in practice. Some examples of use-cases: are you searching for "semantically similar", or "near duplicate"? You can compare documents under different metrics and different _representations_. Some representations are: LSA, PLSA, LDA, TF-IDF, and Set representations, along with metrics such as Jaccard Distance, Cosine Distance, Euclidean distance, etc. Doc2vec is the Word2vec analog for documents. |
|