Hacker News new | ask | show | jobs
by falling_myshkin 855 days ago
I am not sure what you mean specifically by 'overlapping'. But high-dimensional vector space is really "big" in the sense that everything is way closer together compared to low dimensions (this is the curse of dimensionality for euclidean norm), and this is already something one has to think about regardless of the similarity of the source documents. From reading wikipedia it seems like it's been argued that the curse is the worst with independent and uniformly distributed features.