|
|
|
|
|
by BenoitP
3052 days ago
|
|
> KNN works fine on high-dimensional text. From something simple as Hamming distance on binary tokens, to euclidean distance on TFIDF, to cosine distance on 900-dimensional word vector aggregates. > This is why you "fit" something like a K-D tree during training. I would not choose a K-D tree for that. The curse of dimensionality makes K-D trees prohibitively useless as dimensions go up. The number of partitions you have to inspect explodes. Locality Sensitive Hashing tackles this explosion, but with a tradeoff on recall power. 80% recall is quite easy to reach, though. Being near 100% will be prohibitively expensive. This could be good enough for an approximated KNN. |
|