Hacker News new | ask | show | jobs
by karxxm 974 days ago
You wrote „out of the box“, did you find a way to improve this?
1 comments

You can do PCA or some other dimensionality reduction technique. That’ll reduce computation and improve signal/noise ratio when comparing vectors.
Unfortunately this is not feasible with a large amount of words due to the quadratic scaling. But thanks for the response!
Not sure what you mean by large amount of words. You can fit a PCA on millions of vectors relatively performantly, then inference from it is just a matmul.
Not true. You need a distance matrix (for classical PCA it's a covariance matrix), which scales quadratically with the number of points you want to compare. If you have 1 Mio. vectors, each creating a float entry in the matrix, you will end up with approx (10^6)^2 / 2 unique values, which is roughly 2000Gb of memory.