Hacker News new | ask | show | jobs
by godelski 1022 days ago
As the OP points out, Cosine similarity doesn't always equate to relevance. As I was expanding upon, things get really messy as the dimensions increase and your intuition about how vectors relate to one another goes out the window, and fast. Distributional mass is not uniform. Rate of originality increases. And of course, there is no guarantees that latent dimensions align with human meaningful semantic features. There's no pressure to align basis vectors with human perceived semantics. My argument isn't about that there isn't a similarity pressure it's that similarity in high dimensions means different things then similarities in low dimensions. For example, in high dimensions most of a unit cube's mass lies outside the unit sphere, while in 2 or 3 dimensions the unit cube is always contained inside with room to spare. High dimensions are weird and that's what my comment is about because many people are using their lower dimensional intuition for ML.